The What, Why and How of Configurable Processors
How to Avoid the Traps and Pitfalls of SOC Design
A Processor & DSP Selection Checklist
Get your ASICs and SOCs off the Bus!
Processor Configuration with Chris Rowen
One of the most difficult parts of encoding MPEG-4 video data is motion estimation, which requires the ability to search adjacent video frames for similar pixel blocks.
The search algorithm employs a SAD (sum of absolute differences) operation that involves a subtraction, taking the absolute value of the subtraction, and then accumulating that result across the entire video frame.
For a QCIF (quarter common image format) video frame at 15 frames/second, the SAD operation for motion estimation requires just over 641 million operations/second.
As shown in the picture below, it is possible to add SIMD (single instruction, multiple data) SAD hardware capable of executing 16 pixel-wide SAD instructions per cycle using Tensilica’s TIE language. Using a 128-bit maximum bus, it’s also possible to load 16 pixels worth of data in one instruction.

Adding a SIMD SAD computational engine reduces the
computational load by 46x
Combining all three SAD component operations into one instruction and the SIMD extension of this instruction that computes the values for the 16 pixels in one clock cycle funnels the 641 million operations/second requirement into 14 million instructions/second, a reduction of 46x.