Xtensa LX2 Tops Benchmarks
In addition to being the ideal alternative methodology
for hardware block design, the Xtensa LX2 processor
excels at traditional CPU and DSP tasks in embedded
SOCs as demonstrated by industry leading benchmark
results on the EEMBC (Embedded Microprocessor Benchmark
Consortium) benchmark suite and the BDTI
Benchmarks™ by Berkeley Design Technology,
Inc. (BDTI).
Tops EEMBC Networking 2.0 Benchmark
Tensilica’s Xtensa LX processor achieved the highest score
ever reported on the Networking Version 2.0 benchmark
suite of the Embedded Microprocessor Benchmark Consortium (EEMBC).
Tensilica’s
Xtensa LX processor is the first licensable processor core to complete
certification on this challenging benchmark suite.
EEMBC benchmark scores, based on simulation, show that an optimized
Xtensa LX processor core is significantly faster
on a per-MHz basis than the only two other processors certified to
date, the 1GHz PowerPC 750GX and 1.4 GHz PowerPC MPC7447A, both of
which are full-chip, standard product processors. The Xtensa LX processor
delivers this outstanding performance while simultaneously delivering
a 4X code density advantage and more than a 100X advantage in both
die area and power dissipation.
With the Networking 2.0 benchmark, EEMBC simulates real-world networking
performance with many different users and differing traffic types.
The TCPmark represents processor performance in Internet-enabled,
client-side devices. The IPmark represents processor performance in
network routers, gateways and switches.

Because EEMBC scores for licensable synthesizable
processors, such as the Xtensa LX, are expressed
on a “per-MHz” basis,
the PowerPC results were normalized to a “per-MHz” basis
for this comparison.
The total code size (aggregate total of bytes of object code) for
all twelve benchmark kernels in the Networking
Version 2 suite are
- 65,208 bytes – Xtensa LX Optimized
- 67,256 bytes – Xtensa
LX Out of the Box
- 255,764 bytes – PowerPC 760GX
- 280,984 bytes – PowerPC
MCP7447A
This shows that Tensilica’s Xtensa LX has a 4X advantage in
code size.
» See how Tensilica achieved
these great results.
Tops EEMBC Office Automation
Benchmark
The Xtensa LX configurable processor received
the highest score ever recorded for a licensable
processor core, and the highest absolute score
ever published for any processor, on EEMBC’s
Office Automation benchmark suite. The EEMBC bench
mark scores, independently certified by the EEMBC
Certification Laboratories (ECL), confirm that
the Xtensa LX processor is nearly four times faster
than the much larger PowerPC 440GX core, and more
than 4 times as powerful as the 64-bit MIPS 20Kc
processor.

EEMBC Office Automation
Benchmark Scores
The certified EEMBC OAmark scores are:
- 4.19523 – Optimized Xtensa LX processor
- 1.07999 – Out-of-the-box PowerPC 440GX
processor
- 0.98880 – Out-of-the-box Xtensa LX processor
- 0.89033 – Out-of-the-box MIPS 20Kc processor
- 0.75975 – Out of the box ARM 1026EJ-S
processor
EEMBC scores for licensable synthesizable processors
are expressed on a “per-MHz” basis.
The optimized configuration of Xtensa LX used in
this Office Automation benchmark certification
achieved a 454 MHz operating frequency in 90 nm
ASIC technology. At that expected operating frequency,
the 4.19523 OAmarks /MHz would yield an at-speed
score of 1904 OAmarks. The optimized version of
the Xtensa LX runs nearly four times faster than
the much larger, out-of-the-box Power PC 440GX
core, and more than four times faster than the
out-of-the-box 64-bit MIPS 20Kc processor.
In addition to having a significant advantage
in the OAmark scores, Tensilica’s Xtensa
LX processor demonstrated much lower code size,
which means it requires less memory. Code size
results for the Office Automation benchmark were:
- 4,912 bytes – Out-of-the-box Xtensa
LX processor
- 5,908 bytes – Out-of-the-box ARM 1026EJ-S
processor
- 11,024 bytes – Optimized Xtensa LX processor
- 13,780 bytes – Out-of-the-box MIPS 20Kc
processor
- 18,540 bytes – Out-of-the-box IBM PowerPC
440 processor
Tensilica used the EEMBC-provided, ECL-certified
C Code with its XPRES Compiler to generate the
optimized version of the Xtensa LX processor for
this benchmark. ANSI-standard C code tuning was
performed to expose the natural parallelism inherent
in the EEMBC benchmark code. No C intrinsics, no
assembly coding, or other Xtensa-specific changes
were made to the reference EEMBC C code. The resulting
C code could be run on any processor, not just
an Xtensa LX processor.
Tops EEMBC Consumer “Out
of the Box” Benchmark
The Xtensa LX configurable processor core received
the highest certified out-of-the-box score ever
recorded for any 32-bit or 64-bit processor core
tested against the Consumer benchmark suite of
the Embedded Microprocessor Benchmark Consortium
(EEMBC). The Xtensa LX processor’s score
of 0.51997 per MHz, which corresponds to 171.6
Consumermarks in a 330-MHz simulation, was nearly
nine times faster than the next best 32-bit core
and over five times as fast as the fastest 64-bit
RISC CPU tested by EEMBC.

Xtensa LX outperforms
every other licensable CPU core ever tested by
EEMBC on the Consumer “Out of the Box” benchmark
Source: www.eembc.org
The “out of the box” scores are a
good test of compiler performance. The more C-friendly
the processor, the better the score, as the processor
vendor is not allowed to modify the original EEMBC
source code. The exceptional results for the Xtensa
LX processor demonstrate Tensilica’s advanced
XPRES Compiler technology.
The EEMBC Consumer benchmark suite is a compilation
of five separate benchmark kernels that are representative
of consumer digital imaging applications. The high-pass
grey-scale filter benchmark demonstrates performance
in front-end processing of digital still cameras,
showcasing 2-D data array and multiply/accumulate
capabilities. The JPEG compression and decompression
benchmarks take still images from full source data
captured from a sensor, compress to a JPEG file
format for data storage, and reconvert back to
full image representation, a common set of tasks
in consumer products such as digital still cameras
and digital video camcorders. The RGB to CYMK conversion
benchmark demonstrates a common conversion used
in color printing. The RGB to VIQ conversion benchmark
demonstrates a conversion used in NTSC encoders
for digital video processing.
Tops BDTI Benchmarks™
The Xtensa LX configurable processor core achieved
the highest score recorded to date (as of May 2004)
for a licensable processor core on the BDTI Benchmarks™ by
Berkeley Design Technology, Inc. (BDTI). The Xtensa
LX BDTIsimMark2000™ score of 6150 at 370
MHz is 70% faster than the score for the next-fastest
licensable core benchmarked by BDTI, the CEVA-X1620.*
Find out more about using the Xtensa LX configurable
processor for DSP applications.

Xtensa LX outperforms
every other licensable DSP core or
CPU core tested by BDTI
Xtensa LX configuration as tested by BDTI: 248,600 “gates” (equivalent
NAND2X cell area) at post-synthesis; 4.4mm2 actual
layout area; 3D extracted final layout timing under
worst case conditions: 369 MHz
For this benchmark, Tensilica created a unique,
optimized processor configuration. Tensilica’s
engineers used the Xtensa Processor Generator,
selecting the check-box options that fit the benchmark.
Then Tensilica’s engineers added 12 custom
instructions using the TIE (Tensilica Instruction
Extension) methodology to further accelerate performance
hot spots in the algorithms.
The configuration chosen for the BDTI Benchmarks™ is
approximately 250K gates, occupies 4.4 mm2 and
is projected to achieve a robust 370 MHz clock
rate under worst case operating conditions in a
commercially available 130 nm process from a leading
wafer foundry. Tensilica reports that this high-performance
DSP core minimizes power requirements, dissipating
a mere 0.53 mW/MHz in dynamic (switching) power
under typical operating conditions – producing
a total power dissipation (dynamic plus leakage
power) of only 200 mW at a 370 MHz operating speed.**
The configuration used in the BDTI Benchmarks™ includes
Tensilica’s new Vectra LX engine, which provides
a fast path for designers that delivers ultra-high-performance
DSP capabilities. The Vectra LX DSP engine
offers a rich, general-purpose DSP instruction
set of more than 200 instructions tailored for
classic signal processing algorithms such as filters
and FFTs. It utilizes Xtensa LX’s dual-load
store units to provide full DSP capability in a
small size with low power. The Vectra LX option
can be used as-is by simply selecting it as a check-box
configuration option for an Xtensa LX processor,
or also can be delivered as a TIE source file for
use as a starting point in the development of customized
high-performance DSPs.
Download BDTI’s Report on Tensilica Xtensa
LX Processor with Vectra LX
* The BDTIsimMark2000™ provides
a summary measure of DSP speed. For more information
and scores see www.BDTI.com Scores © 2004
BDTI.
The Xtensa LX score assumes
use of 12 custom TIE instructions that expand the
area of the core by 16%. Licensees may require
greater or lesser degrees of customization. The
scores for all other cores assume that no coprocessors
or other customizations were used. The scores for
the Xtensa LX and all other cores are for worst
case operating conditions in a commercially available
130 nm process. Contact info@BDTI.com for
more information.
** The Xtensa LX configuration
tested consumes a static leakage power of 5.5 mW
plus dynamic switching power of 0.53 mW/MHz on
a representative computational benchmark kernel
under typical operating conditions (130 nm high-performance
process – nominal process case, operating
voltage 1.2V).
|