During the autumn of 2015, I was developing a platform to stream short live video feeds. There were several problems with streaming live video — read further for details. The final solution is a platform with 4 services:
- Live Streaming service
- VOD Streaming service
- Real-time Communication service
- REST API service
Our live videos are short — 20 seconds. Our client App uses RTMP to continuously upload a live video. We use RTMP to stream live videos, too. We do not use transcoding for live streams. The live video latency on the watcher side is 5 seconds. I’ve been developing it for two months with zero knowledge about streaming at the beginning.
For the beginning, what is Video Streaming?
What is Video Streaming from user’s point of view?
According to Wikipedia, streaming media is the uninterrupted delivery and presentation of media from provider to end-user. In other words, the media is immediately available, the user does not download the entire file as a whole to consume it. Live streaming takes it a step further as the media is being produced at the same time as well, very much like live tv. Getting the media from “a” to “b”, using established services, however, may be problematic sometimes.
So what the problem with existed streaming solutions?
The trade-off between latency and quality. Latency (aka initial “startup” latency) is evil. On the one hand, sending I-frames (aka keyframes) more frequently reduces startup latency. On the other hand, it increases the bitrate, a media file size, and traffic over the network. So, to prevent affecting the bitrate you can reduce quality.
Technical background: an overview of technologies and services
Video streaming is a mature thing, there are several protocols and services that provide video streaming and live streaming. I talk about the well-known ones.
There are 3 well-known protocols for Live Streaming:
We need choice a protocol that provides recording a video stream. We require this feature to provide VOD and to review video reported by a user as abuse. That’s why WebRTC does not work for my case.
As far as I was new at video streaming, I end up with using RTMP. There are plenty of guides to setup RTMP. On the other hand, it is not easy to start with RTSP.
There are several major streaming solutions in the market:
All of them are great for the different problem: providing a professional streaming service. These services handle plenty of viewers per stream, but all of them have the latency problem. A viewer watches a video with around 10-second delay — for us, it’s too much.
I must admin that Wowza Streaming Engine is awesome, but:
- Wowza Streaming Cloud it had a poor-documented API. At the moment, it nice, please take a look at the developer’s page.
The main difference between two of this is in performance. For more details please take a look at srs’ readme. The minor difference for us was that srs provides RTSP endpoint to record RTSP stream.
Both of them provide the main features for streaming service:
- push RMTP stream (provide an endpoint to record a video)
- deliver RTMP stream (provide an endpoint to watch a video)
- record a Stream to disk
- serve HTTP API to get stream info and control clients connections
- send HTTP hooks on stream stages
- transcode stream with FFmpeg
How to solve the performance Trade-off?
There are several solutions for the trade-off::
- HLS with multi-bitrate transcoding (also known as adaptive bitrate streaming)
- HLS with single-bitrate transcoding
- RTMP without transcoding
The first one provides the best quality from the user point of view. Most of the browsers and mobile devices support HLS. But this has one vital drawback — a huge delay of the first frame.
The second one is quite good. On the one hand, this misses a delay of the first solution. On the other hand, this brings lags and “loading” screens to users with the slow network.
The last one is the best for mobile apps or desktop clients, but on a browser, this requires RTMP client like Flash. On the one hand, this has the lowest delay of the first frame, but on the other hand, the Flash requirements is a huge drawback.
To get more technical detail read the next 3 sections. After them, I write about my solution for live streaming.
HLS with multi-bitrate transcoding
Firstly, we’ve tried to deliver Live-stream via HLS with multi-bitrate transcoding.
Then, we’ve tried to deliver HD video in HLS protocol with minimal delay — approximately 8 seconds. It’s required powerful AWS EC2 instance — c4.8xlarge.
So, the cost to scale-out is too high. To process one Live-Stream we’ve required 1 CPU!
BTW we’ve also tried c3.xlarge, c3.xlarge, m3.xlarge and m4.large instances.
And we’ve tried the second approach.
HLS with single-bitrate transcoding
The first idea was to use HLS without the multi-bitrate feature to reduce the load on the backend and reduce costs.
And it works! The Backend handles 100 concurrent LiveStreams with minimal delay — around 8 seconds.
As you guess, it’s not what one expects from LiveStream.
So, we’ve tried the third approach.
Learned lesson: multi-bitrate transcoding produces too much load to a server.
RTMP without transcoding
After 2 attempts with HLS protocol, we’ve decided to give a chance to RTMP.
This approach shows the best performance! Delay with 100 concurrent LiveStream was around 5 seconds!
Also, without transcoding, we can scale-out smoothly to provide the best performance.
Also, we’ve switched to AWS EC2 instance t2.medium. It works well.
Overview of general video streaming platform architecture, like Periscope, Look
The final solution, we’ve developed, is the Live Streaming platform with 3 services:
- Streaming service;
- Web Service;
- Real-time communication service (aka WebSocket service with REST API for a control).
The diagram below describes the system architecture.
A square is a service, a rounded-square is a message between services, an arrowed line is a link.
A client sends requests to each of the services:
- To the web service to grant access to the streaming service and the WebSocket;
- To the streaming service to publish a Live Stream and to subscribe for a Live Stream or a VOD Stream;
- To the real-time communication to receive messages from web service.
A reason to introduce the real-time communication service is to isolate a publisher-user from a subscriber-user but still to provide a good user experience. A subscriber-user gets a message via WebSocket when a Live-Stream is ready to watch.
What problems have we encountered during development?
We’ve encountered several problems during development:
- Adaptive-bitrate streams are too heavy for backend (requires too much CPU-time)
- It’s reasonable only in the case of VOD Streams or broadcast Live Streams;
- It’s not reasonable for Short Live-Streams.
- HLS is too heavy for backend;
- It’s reasonable only in the case of VOD Streams or broadcasted Live Streams;
- It’s not reasonable for Short Live-Streams.
- User’s required good experience of the application flow: requesting stream → waiting for a publisher → watching a live stream.
- That’s why we require the Real-Time communication service.
- There is no good guide for our case: a lot of concurrent One-to-One Live Stream.
- There are a plenty of guides and recipes for tuning RTMP server for Broadcasting one Stream to Many subscribers.
- Publishing Stream to the server is too hard for a battery of the mobile device.