Performance using C Code

Great Performance from C Algorithms - No Assembly Required

The ConnX D2 DSP engine is tightly integrated with the advanced Tensilica XCC compiler technology. The XCC compiler can efficiently map C algorithms to the ConnX D2 ISA (instruction set architecture) from native C and C intrinsic code, removing the need for time-consuming assembly code optimization. An example of this "out of the box" performance is the AMR-NB (VAD2) algorithm (encoder + decoder), which requires just 21.6MHz. This beats equivalent competitive DSP cores by as much as twice the performance in some comparisons. Another example is the ConnX D2 running an 256 point complex FFT (fast Fourier transform), which gives 20% better performance than a competitive DSP core running hand optimized assembly code.

The ability to do optimized software design in C means that there is a faster response time to changing algorithms as well as less dependence upon key programming resources.

We've Tested It and It Works

Tensilica has tested optimized C source code from Tata Elxsi's extensive DSP software library and the existing code - including code optimized with industry standard C intrinsics - ran flawlessly on the ConnX D2 DSP engine. This means that designers can rely on Tata Elxsi's proven software services to quickly get new ConnX D2-based SOCs designed into innovative new products. It also means that any code that you have with TI C6x or ITU-T intrinsics should also work just fine on the ConnX D2 DSP engine.  See press release.

Consistent Performance - Even when Vectorization is Not Possible

Many high-performance DSPs are large SIMD engines that run vector data through at maximum bandwidth.

These DSPs rely upon compiler vectorization of C code to hit their peak performance. However, if a loop is not vectorizable, then the SIMD engine degenerates into a single-MAC DSP. The major shortcoming of these DSPs is that non-vectorizable code is commonplace. One example: a loop where non-single-integer address increments are used.

The ConnX D2 engine solves this problem by using the parallel computation of the FLIX (VLIW) architecture. The dual MACs in the ConnX D2 engine can be fully saturated with either SIMD instructions or VLIW instructions, delivering maximum performance on all types of C code.

VLIW in ConnX D2

VLIW provides twice the performance by executing two instructions in parallel

Here's a simplified block structure of the ConnX D2 engine with two MAC units as well as register banks.

ConnX D2 architecture

A simplified block structure of the ConnX Baseband Engine with 8-way SIMD and 3-way VLIW


Marketing Agency