Some video frames take longer to process than others because of the nature of digital video compression. These wide variations in video-processing time make correct operation of an ASIC or SOC video-processing system unpredictable. A video processor that minimizes processing-time variations in for each video frame enables the design of a more reliable and less expensive system that also consumes less power.
Whether video is processed in a hardware block, a general purpose processor, or an optimized DSP processor, each frame takes a different amount of time to process. This is because each frame of video is different and is compressed in variable ways by the encoder for best efficiency. Most digital video coding standards process video as a sequence of square macroblocks and an important video-compression technique is identifying and coding macroblocks in each video frame image that are identical or similar to their neighbor. Finding nearly identical macroblocks can reduce video-coding time. For example, the sky appearing at the top left corner of the image in Figure 1 is almost identical from one macroblock to the adjacent macroblocks.
Another important technique for achieving compression is identifying objects that have moved from a nearby location in a previous frame of video, such as with the light post on the bridge. The ability to find similar macroblocks within one frame and finding similar macroblocks between successive video frames are essential algorithms used in techniques respectively called intra-frame and inter-frame prediction.
Using prediction, a video encoder encodes the location-in the current or a previous frame-from which to predict each macroblock. The encoder also encodes the imperfections between the prediction and the actual captured pixels. The imperfections, called residuals, tend to be need much smaller data values than unencoded pixels and so less data must be encoded than would be the case without prediction. This pixel reduction leads to compression.
Video frames with many finely detailed macroblocks will be less accurately predicted from their neighbors and video frames with many moving objects will be less accurately predicted from frame to frame. The marathon frame in Figure 2(a) requires more data for prediction and yields less accurate prediction than the sky appearing in the frame shown in Figure 2(b). Therefore, a video coder will require more bits to encode the prediction imperfections of the frame in Figure 2(a) than it will for the frame shown in 2(b).