The What, Why and How of Customizable Dataplane Processors (DPUs)
How to Avoid the Traps and Pitfalls of SOC Design
A Processor & DSP Selection Checklist
Tensilica’s Xtensa LX processor takes application performance to new heights. In benchmark after benchmark, the Xtensa LX processor proves it can reach performance levels that are orders of magnitude above all other processor cores, rivaling RTL performance. How is this possible? Because you can configure and extend the processor to your exact application requirements, the Xtensa LX processor can reach RTL speeds in two major ways:
The FLIX architecture allows the implementation of highly parallel processors with a performance characteristic of specialty ultra-wide instruction word processors, without the negative code size implications typically found in such VLIW or ULIW solutions.
FLIX is a configuration option that allows designer-defined instructions to consist of multiple, independent operations bundled into a 32-bit or 64-bit instruction word. Wide 32-or-64-bit FLIX instruction formats are seamlessly and modelessly intermixed with the base Xtensa ISA’s existing 16-/24-bit instructions - there is no mode switch penalty to utilize a FLIX instruction.
Designers can figure out their own FLIX implementations or use the to automatically determine the best FLIX combinations.
Tensilica lets designers add specialized functions right into the processor’s execution units without requiring that the designers understand the processor architecture.Designers use our TIE (Tensilica Instruction Execution) language, a Verilog-like language, to specify the functionality required. Our Xtensa Processor Generator automatically adds these functions to the processor - and we guarantee that it's functionally correct. You never touch the actual processor RTL to make your modifications.
The Xtensa LX configurable processor core received the highest certified out-of-the-box score ever recorded for any 32-bit or 64-bit processor core tested against the Consumer benchmark suite of the Embedded Microprocessor Benchmark Consortium (EEMBC). The Xtensa LX processor’s score of 0.51997 per MHz, which corresponds to 171.6 Consumermarks in a 330-MHz simulation, was nearly nine times faster than the next best 32-bit core and over five times as fast as the fastest 64-bit RISC CPU tested by EEMBC.
Xtensa LX outperforms every other licensable CPU core ever tested by EEMBC on the Consumer “Out of the Box" benchmark
Source: www.eembc.org

The “out of the box" scores are a good test of compiler performance. The exceptional results for the Xtensa LX processor demonstrate Tensilica’s advanced compiler technology.
On a separate benchmark, the Xtensa LX configurable processor core achieved the highest score recorded to date (as of Nov 2009 since is was certified in May 2004) for a licensable processor core on the BDTI Benchmarks™ by Berkeley Design Technology, Inc. (BDTI). The Xtensa LX BDTIsimMark2000™ score of 6150 at 370 MHz is 70% faster than the score for the next-fastest licensable core benchmarked by BDTI, the CEVA-X1620.*

Xtensa LX configuration as tested by BDTI: 248,600 “gates" (equivalent NAND2X cell area) at post-synthesis; 4.4mm2 actual layout area in 130nm; 3D extracted final layout timing under worst case conditions: 369 MHz
For this benchmark, Tensilica created a unique, optimized processor configuration. Tensilica’s engineers used the Xtensa Processor Generator, selecting the check-box options that fit the benchmark. Then Tensilica’s engineers added 12 custom instructions using the TIE (Tensilica Instruction Extension) methodology to further accelerate performance hot spots in the algorithms.
The configuration chosen for the BDTI Benchmarks™ is approximately 250K gates, occupies 4.4 mm2 and is projected to achieve a robust 370 MHz clock rate under worst case operating conditions in a commercially available 130 nm process from a leading wafer foundry. Tensilica reports that this high-performance DSP core minimizes power requirements, dissipating a mere 0.53 mW/MHz in dynamic (switching) power under typical operating conditions – producing a total power dissipation (dynamic plus leakage power) of only 200 mW at a 370 MHz operating speed.**
The configuration used in the BDTI Benchmarks™ includes Tensilica’s Vectra LX engine, which provides a fast path for designers that delivers ultra-high-performance DSP capabilities. The Vectra LX DSP engine offers a rich, general-purpose DSP instruction set of more than 200 instructions tailored for classic signal processing algorithms such as filters and FFTs. It utilizes Xtensa LX’s dual-load store units to provide full DSP capability in a small size with low power. The Vectra LX option can be used as-is by simply selecting it as a check-box configuration option for an Xtensa LX processor, or also can be delivered as a TIE source file for use as a starting point in the development of customized high-performance DSPs.
Download BDTI’s Report on Tensilica Xtensa LX Processor with Vectra LX