Explore a simple video streaming service architecture
Video streaming service refers to a service that transmits video data in real time. Since these services transmit video data in real time, technology is required to compress and transmit data in real time. These technologies include video capture, compression, encoding, decoding, and playback.
Services such as Netflix, Disney+, YouTube, and Twitch are services that provide video streaming services. These services are used by many users around the world and process large amounts of data. Due to the development of networks, streaming services are becoming more advanced, and services that provide high-definition video such as 4K and 8K are also emerging.
I'm curious about the structure of these services, so let's take a look at a simple video streaming service architecture. It has various structures depending on the service, but this time we will look at a simple video streaming service architecture.
The actual service is much more complex and uses a variety of technologies, so please use it for reference only.
Below is a simple video streaming service architecture.
Video streaming service largely consists of SendDevice, which transmits video data, Server, which processes video data on the server, and WatchDevice, which receives video data.
Each performs the following roles.
SendDevice: Responsible for capturing and compressing video data.
Server: Responsible for processing and transmitting video data.
WatchDevice: Responsible for receiving and playing video data from the server.
When broadcasting on services such as Youtube or Twitch, SendDevice can be the broadcasting user's computer or mobile device.
SendDevice is responsible for capturing and compressing video data. In this structure, the process of capturing and compressing video data is a process of processing video data in real time, so high speed and stability are required.
The reason video data is compressed before sending it to the server is to save network bandwidth. Network costs are a problem that requires a lot of money. Since video data is a continuation of image data, compressing image data can save bandwidth.
Compress video data using video compression codecs such as H.264, H.265, and audio compression codecs such as AAC and MP3.
In SendDevice, compressed video data is sent to the server.
However, there are cases where SendDevice transmits video data to the server without compression. In this case, an additional process of compressing the video data on the server is required.
Server is responsible for processing and transmitting video data. Server receives video data, processes it, and transmits it to WatchDevice.
Server may consist of one server or multiple servers. Large-scale services consist of multiple servers.
Pre-prepare video data of various resolutions such as 540p, 720p, 1080p, 4K, etc. and transmit video data of appropriate resolution upon user request.
Video data with appropriate resolution must be transmitted depending on the user's network conditions.
If video data of appropriate resolution is transmitted at the user's request, the user can quickly watch the video.
The process of transmitting video data of appropriate resolution according to the user's request is called Adaptive Bitrate Streaming.
The server receives and processes video data as follows.
Performs the process of receiving video data and dividing it into segments.
A segment is a unit that divides and transmits video data. Segments are divided into certain lengths and transmitted, and WatchDevice receives and plays the segments.
The process of dividing the initially input video data into segments is called segmentation.
Segments are transmitted using protocols such as HTTP Live Streaming (HLS) or MPEG-DASH.
Without dividing the video data, users will not be able to play the video quickly. Transmitting in segments allows users to play the video quickly.
Send segment to CDN.
CDN stands for Content Delivery Network.
CDN consists of servers distributed around the world and can quickly deliver content to users.
WatchDevice is responsible for receiving and playing video data from the server. The WatchDevice can be the user's computer or mobile device watching the video.
WatchDevice receives segments from CDN and plays them. A segment is video data divided into certain lengths. Segments are transmitted using protocols such as HTTP Live Streaming (HLS) or MPEG-DASH.
WatchDevice receives the segment, decodes it, and plays it. Decoding refers to the process of decoding video data, and playback refers to the process of displaying video data on the screen.