Going Inside Beamr’s Frame-Level Content-Adaptive Rate Control for Video Coding
When it comes to video, the tradeoff between quality and bitrate is an ongoing dance. Content producers want to maximize quality for viewers, while storage and delivery costs drive the need to reduce bitrate as much as possible. Content-adaptive encoding addresses this challenge, by striving to reach the “optimal” bitrate for each unique piece of content, be it a full clip or a single scene. Our CABR technology takes it a step further by adapting the encoding at the frame level. CABR is a closed-loop content-adaptive rate control mechanism enabling video encoders to lower the bitrate of their encode, while simultaneously preserving the perceptual quality of the higher bitrate encode. As a low-complexity solution, CABR also works for live or real-time encoding.
All Eyes are on Video
According to Grand View Research, the global video streaming market is expected to grow at a CAGR of 19.6% from 2019 to 2025. This shift, fueled by the increasing popularity of direct-to-consumer streaming services such as Netflix, Amazon and Hulu, the growth of video on social media networks and user-generated video platforms such as Facebook and YouTube, and other applications like online education & video surveillance, has all eyes on video workflows. Therefore, efficient video encoding, in terms of encoding and delivery costs, and meeting the viewer’s rising quality expectations, are at the forefront of video service provider’s minds. Beamr’s CABR solution can reduce bitrates without compromising quality while keeping a low computational overhead to enhance video services.
Comparing Content-Adaptive Encoding Solutions
Instead of using fixed encoding parameters, content-adaptive encoding configures the video encoder according to the content of the video clip to reach the optimal tradeoff between bitrate and quality. Various content-adaptive encoding techniques have been used in the past to provide a better user experience with reduced delivery costs. Some of them have been entirely manual, where encoding parameters are hand-tuned for each content category and sometimes, like in the case of high-volume Blu-ray titles, at the scene level. Manual content-adaptive techniques are restricted in the sense that they can’t be scaled, and they don’t provide granularity lower than the scene level.
Other techniques, such as those used by YouTube and Netflix, use “brute force” encoding of each title by applying a wide range of encoding parameters, and then by employing rate-distortion models or machine learning techniques, try to select the best parameters for each title or scene. This approach requires a lot of CPU resources since many full encodes are performed on each title, at different resolutions and bitrates. Such techniques are suitable for diverse content libraries that are limited in size, such as premium content including TV series and movies. These methods do not apply well to vast repositories of videos such as user-generated content, and are not applicable to live encoding.
Beamr’s CABR solution is different from the techniques described above in that it works in a closed-loop and adapts the encoding per frame. The video encoder first encodes a frame using a configuration based on its regular rate control mechanism, resulting in an initial encode. Then, Beamr’s CABR rate control instructs the encoder to encode the same frame again with various values of encoding parameters, creating candidate encodes. Using a patented perceptual quality measure, each candidate encode is compared with the initial encode, and then the best candidate is selected and placed in the output stream. The best candidate is the one that has the lowest bitrate but still has the same perceptual quality as the initial encode.
Taking Advantage of Beamr’s CABR Rate Control
In order for Beamr’s CABR technology to encode video to the minimal bitrate and still retain the perceptual quality of a higher bitrate encode, it compresses each video frame to the maximum extent that provides the same visual quality when the video is viewed in motion. Figure 1 shows a block diagram of an encoding solution which incorporates CABR technology.
Figure 1 – A block diagram of the CABR encoding solution
An integrated CABR encoding solution consists of a video encoder and the CABR rate control engine. The CABR engine is comprised of the CABR control module responsible for managing the optimization process and a module which evaluates video quality.
As seen in Figure 2, the CABR encoding process consists of multiple steps. Some of these steps are performed once for each encoding session, some are performed once for each frame, and some are performed for each iteration of candidate frame encoding. When starting a content-adaptive encoding session, the CABR engine and the encoder are initialized. At this stage, we set system-level parameters such as the maximum number of iterations per frame. Then, for each frame, the encoder rate control module selects the frame types by applying its internal logic.
Figure 2. A block diagram of a video encoder incorporating Content Adaptive Bit-Rate encoding.
The encoder provides the CABR engine with each original input frame for pre-analysis within the quality measure calculator. The encoder performs an initial encode of the frame, using its own logic for bit allocation, motion estimation, mode selections, Quantization Parameters (QPs), etc. After encoding the frame, the encoder provides the CABR engine with the reconstructed frame corresponding to this initially encoded frame, along with some side information – such as the frame size in bits and the QP selected for each MacroBlock or Coding Tree Unit (CTU).
In each iteration, the CABR control module first decides if the frame should be re-encoded at all. This is done, for example, according to the frame type, the bit consumption of the frame, the quality of previous frames or iterations, and according to the maximum number of iterations set for the frame. In some cases, the CABR control module may decide not to re-encode a frame at all – in that case, the initial encoded frame becomes the output frame, and the encoder continues to the next frame. When the CABR control module decides to re-encode, the CABR engine provides the encoder with modified encoding parameters, for example, a proposed average QP for the frame, or the difference from the QP used for the initial encode. Note that the QP or delta QP values are an average value, and QP modulation for each encoding block can still be performed by the encoder. In more sophisticated implementations a QP map of value per encoding block may be provided, as well as additional encoder configuration parameters.
The encoder performs a re-encode of the frame with the modified parameters. Note that this re-encode is not a full encode, since it can utilize many encoding decisions from the initial encode. In fact, the encoder may perform only re-quantization of the frame, reusing all previous motion vectors and mode decisions. Then, the encoder provides the CABR engine with the reconstructed re-encoded frame, which becomes one of the candidate frames. The quality measure module then calculates the quality of the candidate re-encoded frame relative to the initially encoded frame, and this quality score, along with the bit consumption reported by the encoder is provided to the CABR control module, which again determines if the frame should be re-encoded. When that is the case, the CABR control module sets the encoding parameters for the next iteration, and the above process is repeated. If the control module decides that the search for the optimal frame parameters is complete, it indicates which frame, among all previously encoded versions of this frame, should be used in the output video stream. Note that the encoder rate control module receives its feedback from the initial encode of the current frame, and in this way the initial encode of the next frames (which determines the target quality of the bitstream) is not affected.
The CABR engine can operate in either a serial iterative approach or a parallel approach. In the serial approach, the results from previous iterations can be used to select the QP value for the next iteration. In the parallel approach, all candidate QP values are provided simultaneously and encodes are done in parallel – which reduces latency.
Integrating the CABR Engine with Software & Hardware Encoders
Beamr has integrated the CABR engine into its AVC software encoder, Beamr 4, and into its HEVC software encoder, Beamr 5. However, the CABR engine can be integrated with any software or hardware video encoder, supporting any block-based video standard such as MPEG-2, AVC, HEVC, EVC, VVC, VP9, and AV1.
To integrate the CABR engine with a video encoder, the encoder should support several requirements. First and foremost, the encoder should be able to re-encode an input frame (that has already been encoded) with several different encoding parameters (such as QP values), and save the “state” of each of these encodes, including the initial encode. The reason for saving the state is that when the CABR control module selects one of the candidate frame encodes (or the initial encode) as the one to use in the output stream, the encoder’s state should correspond to the state it was right after encoding that candidate frame. Encoders that support multi-threaded operation and hardware encoders typically have this capability, since each frame encode is performed by a stateless unit.
Second, the encoder should support an interface to provide the reconstructed frame and the per block QP and bit consumption information for the encoded frame. To improve compute performance, we also recommend that the encoder supports a partial re-encode mode, where information related to motion estimation, partitioning and mode decisions found in the initial encode can be re-used for re-encoding without being computed again, and only the quantization and entropy coding stages are repeated for each candidate encode. This results in a minimal encoding efficiency drop for the optimized encoding result, with significant speed-up compared to full re-encode. As described above, we recommend that the encoder will use the initial encoded data (QPs, compressed size, etc.) for its Rate Control state update. However, the selected frame and accompanying data must be used for reference frames and other reference data, such as temporal MV predictors, as it is the only data available in the bitstream for decoding.
When integrating with hardware encoders that support parallel encoding with no increase in latency, we recommend using the parallel search approach where multiple QP values per frame are evaluated simultaneously. If the hardware encoder can perform parallel partial encodes (for example, re-quantization and entropy coding only), while all parallel encodes use the analysis stage of the initial encode, such as motion estimation and mode decisions, better CPU performance will be achieved.
Below, we provide two sample results of the CABR engine, when integrated with Beamr 5, Beamr’s HEVC software encoder, each illustrating different aspects of CABR.
For the first example, we encoded various 4K 24 FPS source clips to a target bitrate of 10 Mbps. Sample frames from each of the clips can be seen in Figure 3. The clips vary in their content complexity: “Crowd Run” has very high complexity since it has great detail and very significant motion of the runners. “StEM” has medium complexity, with some video compression challenges such as different lighting conditions and reasonably high film grain. Finally, a promotional clip of JPEGmini by Beamr has low complexity due to relatively low motion and simple scenes.
Figure 3. Sample frames from the test clips. top: crowd-run, bottom left: StEM bottom right: JPEGmini.
We encoded 500 frames from each clip to a target bitrate of 10 Mbps, using the VBR mode of the Beamr 5 HEVC encoder, which performs regular encoding, and using the CABR mode, which creates a lower bit-rate, perceptually identical stream. For the high complexity clip “Crowd Run,” where providing excellent quality at such an aggressive bitrate is very challenging, CABR reduced the bitrate by only 3%. For the intermediate complexity clip “StEM,” bitrate savings were higher and reached 17%. For the lowest complexity clip “JPEGmini,” CABR reduced the bitrate by a staggering 45%, while still obtaining excellent quality which matches the quality of the 10 Mbps VBR encode. This extensive range of bitrate reduction percentage demonstrates the fully automatic content-adaptive nature of CABR-enhanced encoder, which reaches a different final bitrate, according to the content complexity.
The second example uses a 500 frame 1080p 24 FPS clip from the well-known “Tears Of Steel” movie by the Blender open movie project. The same clip was encoded using the VBR and CABR modes of the Beamr 5 HEVC software encoder, with three target bitrates: 1.5, 3 and 5 Mbps. Savings, in this case, were 13% for the lowest bitrate resulting in a 1.4 Mbps encode, 44% for the intermediate bitrate resulting in an encode of 1.8 Mbps, and 62% for the highest bitrate, resulting in a 2 Mbps encode. Figures 4 and 5 show sample frames from the encoded clips with VBR encoding on the left vs. CABR encoding on the right. The top two images are from encodes to a bitrate of 5 Mbps, while the bottom two were taken from the 1.5 Mbps encodes. As can be seen here, both 5 Mbps target encodes preserve the details, such as the texture of the bottom lip or the two hairs on the forehead above the right eye, while in the lower bitrate encodes these details are somewhat blurred. This is the reason that when starting from different target bitrates, CABR does not converge to the same bitrate. We also see, however, that the more generous the initial encoding, generally the more savings can be obtained. This example shows that CABR adapts not only to the content complexity, but also to the quality of the target encode, and preserves perceptual quality in motion while offering significant savings.
Figure 4. A sample from the “Tears of Steel” 1080p 24 FPS encode to 5 Mbps (top) and 1.5 Mbps (bottom), encoded in VBR mode (left) and CABR mode (right)
Figure 5. Closer view of the face in Figure 4, showing detail of lips and forehead from the encode to 5 Mbps (top) and 1.5 Mbps (bottom), encoded in VBR mode (left) and CABR mode (right).
To learn how our CABR solution leverages our patented quality measure, continue to “The patented visual quality measure that makes all the difference.”