Getting RTL Performance from a Processor
Tensilica’s Xtensa
LX2 processor takes application
performance to new heights. In benchmark
after benchmark, the Xtensa LX2 processor proves it can
reach performance levels that are orders of magnitude
above all other processor cores, rivaling RTL performance.
How is this possible? Because you can configure
and extend the processor to your exact application
requirements, the Xtensa LX2 processor can reach
RTL speeds in two major ways:
- The Xtensa LX2 processor’s innovative
FLIX (Flexible Length Instruction Xtensions)
architecture allows designers to pack multiple
operations more efficiently into wider words.
- Designers can add RTL-like functions right
into the execution units of the processor using
Tensilica’s automated processes.
FLIX Packs It In
The FLIX
architecture allows the implementation
of highly parallel processors with a performance
characteristic of specialty ultra-wide instruction
word processors, without the negative code size
implications typically found in such VLIW or ULIW
solutions.
FLIX is a configuration option that allows designer-defined
instructions to consist of multiple, independent
operations bundled into a 32-bit or 64-bit instruction
word. Wide 32-or-64-bit FLIX instruction formats
are seamlessly and modelessly intermixed with the
base Xtensa ISA’s existing 16-/24-bit instructions
- there is no mode switch penalty to utilize a
FLIX instruction.
Designers can figure out their own FLIX implementations
or use the XPRES
Compiler to automatically determine
the best FLIX combinations.
Add RTL-like Functions Into the Processor’s
Execution Units, Automatically
Tensilica lets designers add specialized functions
right into the processor’s execution units
without requiring that the designers understand
the processor architecture. Designers just input
their C/C++ algorithms into Tensilica’s XPRES
Compiler, and the compiler will figure out the
best possible configuration options and extensions
for your design. Or you can decide what accelerators
you want to add to the processors yourself.
Tensilica’s XPRES Compiler can take a quick
(usually under one hour) look at your C/C++ algorithm
and recommend several ways to extend the Xtensa
processor to get the performance you need to run
that algorithm without any RTL coding. The XPRES
compiler uses a number of techniques (explained
here) that allows it to get the 8X improvement
shown in the EEMBC benchmark, below.
Benchmarks Prove It – Xtensa is Fast
The Xtensa LX configurable processor core received
the highest certified out-of-the-box score ever
recorded for any 32-bit or 64-bit processor core
tested against the Consumer benchmark suite of
the Embedded Microprocessor Benchmark Consortium
(EEMBC). The Xtensa LX processor’s score
of 0.51997 per MHz, which corresponds to 171.6
Consumermarks in a 330-MHz simulation, was nearly
nine times faster than the next best 32-bit core
and over five times as fast as the fastest 64-bit
RISC CPU tested by EEMBC.

Xtensa LX outperforms
every other licensable CPU core ever tested by
EEMBC on the Consumer “Out of the Box” benchmark
Source: www.eembc.org
The “out of the box” scores are a
good test of compiler performance. The more C-friendly
the processor, the better the score, as the processor
vendor is not allowed to modify the original EEMBC
source code. The exceptional results for the Xtensa
LX processor demonstrate Tensilica’s advanced
XPRES Compiler technology.
On a separate benchmark, the Xtensa LX configurable
processor core achieved the highest score recorded
to date (as of May 2004) for a licensable processor
core on the BDTI Benchmarks™ by Berkeley
Design Technology, Inc. (BDTI). The Xtensa LX BDTIsimMark2000™ score
of 6150 at 370 MHz is 70% faster than the score
for the next-fastest licensable core benchmarked
by BDTI, the CEVA-X1620.

Xtensa LX configuration
as tested by BDTI: 248,600 “gates” (equivalent
NAND2X cell area) at post-synthesis; 4.4mm2 actual
layout area; 3D extracted final layout timing
under worst case conditions: 369 MHz
For more detail on this benchmark, see our explanation
of how we created a configuration of Xtensa
LX optimized for this DSP application.
< previous
page | next page >
|