AV1 Is Ready for RTC Applications
The royalty-free codec standard AV1 gets plenty of attention and for good reason. With some of the largest streaming video apps, service, and platform companies in the world supporting the standard, the entire industry is keeping close tabs.
During the early adoption phase of a next-generation standard, it’s well known that until the armies of codec engineers that are focused on wringing out all the performance capability that is possible, can complete the task, there is often a very real and significant penalty for the coding efficiency gains in the excessive compute time required. And, AV1 was no different. But my how far we’ve come in just a little over a few short years.
For the application of video on demand or VOD, codec performance though still important can be deemphasized as the benefits of the increased quality and/or significant bitrate reduction pay back the added compute cost many times over with bandwidth savings. But for real-time communications (RTC) and certainly any application using WebRTC, the ability to provide compression benefits with only a marginal performance increase are not desired, but essential.
AV1, because it’s been optimized for real-time Screen Content Coding (SCC) is ideal for the use-case of RTC and this is why it’s been deployed by a number of large enterprise customers including Google for Duo and Cisco with Webex. Aside from video conferencing applications, screen content represents a significant percentage of online Internet video content including remote desktop services, virtualized graphics, and video editing solutions, computer-generated graphics applications, and live game content streaming.
Screen Content Coding (SCC)
AV1 is the first video codec standard that includes Screen Content Coding (SCC) tools in its main body, meaning that every AV1 decoder to be compliant must support these SCC features. In comparison, the HEVC standard also specified SCC coding tools but only in its extensions, meaning that not all decoders supported them and this greatly limited their usefulness.
For screen content video types, Aurora1 demonstrates superior coding efficiency advantages compared to other state-of-the-art codec standards, including H.264 and VP9, across all resolutions. Using a single core of a standard PC processor, Aurora1 can achieve a BD-rate saving of more than 80% compared to that of x264, meaning that Aurora1 needs less than a quarter of the bitrate consumed by x264 to achieve the same visual quality.
As shown in Figure 1 below, Aurora1 compared to x264 running in its superfast mode, Aurora1 achieves a BD-rate (PSNR) savings of 81.25% with a BD-PSNR improvement of 13.95dB.
NOTE: In all comparisons, we used the following command options for x264: ffmpeg -threads 1 -r 30 -s 1920×1080 -c:v libx264 -x264-params bframes=0 -tune zerolatency -preset superfast -threads 1
The idea that AV1 is slow has been obliterated by Aurora1 as Figure 2 shows that the encoder can reach more than 45FPS for 1080p screen content videos using just a single core of a standard PC processor while achieving more than 80% bitrate savings compared to x264. This confirms that AV1 is ready for deployment in low-latency RTC scenarios.
Temporal scalability is an important feature that is perfectly designed for the RTC use case where network conditions fluctuate significantly over time as a result of varying available bandwidth, RTT (round-trip time) delays, jitter, packet loss, etc. Video encoders need to adapt to the varying bandwidth conditions so as to make the most out of the available network resource to provide the best possible Quality of Experiences (QoE). When network congestion occurs, video encoder bitstreams must rapidly respond to reduce bitrate by altering frame rate (FPS), frame resolution, and QP level. Otherwise, frozen video and a disastrous UX will be presented to viewers.
Temporal scalability is an effective encoding tool that enables the bitstream to effectively adapt to fluctuating network bandwidth conditions. It allows the dropping of frames without affecting the decoding of adjacent frames, and it maintains the video flow to avoid frozen video and artifacts.
With the temporal scalability enabled, Aurora1 provides a friendly API aligned with WebRTC and can easily integrate with WebRTC for RTC applications. It allows flexible bitrate adaptation with frame rate adjustment on-the-fly to adapt to the network bandwidth fluctuation. It allows dropping of frames without impacting the decoding of other frames in the sequence.
As shown in Figure 4, Aurora1 implements two temporal scalability frameworks, to target the satisfaction of various user scenarios. For instance, if one bitstream serves multiple end-users as the receiver sides, with a large range of bandwidth conditions, the three-layer framework may be considered.
In WebRTC the open framework for the web that enables Real-Time Communications (RTC), three encoders are included, namely libvpx-VP8, libvpx-VP9, and OpenH264 Cisco’s H264 encoders, all supporting temporal scalability capability. Temporal scalability is a feature that is perfectly designed for WebRTC. With the use of temporal scalability, coding performance will be generally impacted.
As shown in Figure 5, when the temporal scalability feature is turned on, Aurora1’s coding performance drops compared to the non-scalable scenario, but still maintains a distinct advantage compared to x264.
Aurora1 for Camera Content
Camera captured content constitutes the major content in the user scenarios of video conferencing, distance learning, remote medical, etc. Aurora1 has made breakthroughs to demonstrate that it’s ready for deployment in such RTC scenarios, starting with PC platforms. We continuously optimize Aurora1 so that it can be deployed across all platforms and cover a wide range of use cases.
As shown in Figure 6 and Figure 7, videos covering a large test set including 480p and 720p video conferencing camera captured clips were used. We ran Aurora1 and x264 on an AMD Ryzen 9 3900X 12-Core Processor (12C24T) using 2 threads.
Following are the command options for x264: –preset medium –nal-hrd none –preset medium –profile main –threads 2 –tune zerolatency –no-psy –aq-mode 0 –no-scenecut
Table 1 shows Aurora1’s performance compared against x264 medium, Aurora1 superfast saves >20% bitrate to achieve the same objective quality, while encoding at a speed >30% faster.
Aurora1 for RTC: Current Performance over Mobile Platforms
Aurora1 real-time video communications (RTC) mobile encoder will be released in early 2021. To show the benefit of the current Aurora1 version against libvpx-VP9 and x264 we operated the encoders in a zerolatency configuration using a single thread running on the Snapdragon 845 mobile device encoding a mixed content set of 40 180p videos at a target bitrate range of 50kps to 200kbps.
Figure 8 and Figure 9, with Table 2, demonstrate our superior performance today. In Figure 8, the Y-axis indicates BD-rate in PSNR compared to x264 medium. Negative values in BD-rate mean lower bitrates (in percentage) needed to achieve the same objective quality, as opposed to what can be done by the anchor encoder. Hence, the lower the curve is, the more superior the coding efficiency is. The X-axis on the other hand indicates the encoding speed in FPS (frames per second), all running with single thread.
For the Aurora1 RTC version, new faster speed levels will be developed. Aurora1 medium, fast, faster, veryfast, superfast, to ultrafast, will maintain the largest possible coding efficiency advantage of AV1, compared to existing H264/AVC, VP8, and VP9 solutions, for RTC applications.