Tensilica Xtensa LX Processor
Tops EEMBC Networking 2.0 Benchmarks
Xtensa LX processor first
licensable core to certify results on newest
EEMBC benchmark
Xtensa LX Beats PowerPC
Full-Chip Results
Santa Clara,
Calif. - May 16, 2005 - Tensilica ,
Inc., the only company to automate the design
of optimized application-specific configurable
processors for system-on-chip (SOC) design, today
announced that it has achieved the highest score
ever reported on the Networking Version 2.0 benchmark
suite of the Embedded Microprocessor Benchmark
Consortium (EEMBC). Tensilica's Xtensa LX processor
is the first licensable processor core to complete
certification on this challenging benchmark suite.
EEMBC benchmark scores, based on simulation,
show that an optimized Xtensa LX processor core
is significantly faster on a per-MHz basis than
the only two other processors certified to date,
the 1GHz PowerPC 750GX and 1.4 GHz PowerPC MPC7447A,
both of which are full-chip, standard product
processors. The Xtensa LX processor delivers
this outstanding performance while simultaneously
delivering a 4X code density advantage and more
than a 100X advantage in both die area and power
dissipation.
Multi-Core ASSP/ASIC Design
Benefits
All of today's leading edge ASSP
and ASIC designs, and a growing number of general-purpose
processor designs, employ multiple specialized
processing engines on chip, particularly in networking
applications and now, even in consumer designs.
Examples range from Cisco's performance-leading
CRS-1 terabit router, which relies upon the innovative
Cisco-designed Silicon Packet Processor built with
188 Tensilica Xtensa processor cores, to the recently
announced Playstation Cell processor and to the
emerging "dual-core" war in the desktop PC market.
The key attributes needed in a processor
core used in a multi-core architecture are: small
physical size and low-power (to maximize the number
of cores per chip); excellent code density (to
minimize the area needed for local instruction
and data memories attached to each processor core);
communication infrastructure and capabilities (to
quickly transfer data); and outstanding application-specific
or function-specific performance (so that each
core in the design can be dedicated to a specific
type of task).
The EEMBC Networking V2 results demonstrate
that the Xtensa LX core excels in all four key
attributes. [Note that Tensilica's results are
for a single Xtensa LX processor core in a configuration
that is representative of how it could be used
in an SOC design for a networking application.]
Size & Power : The Xtensa LX
processor configuration consumes a mere 1.2 square
mm in a reference high-performance 130 nm process
technology, using conventional standard-cell implementation
techniques (excluding memory area). This core is
projected to consume an estimated 115 milli-watts
of power when operated at its maximum 304 MHz operating
frequency. Contrast that miserly power figure to
that of the leading full-chip processor certified
by EEMBC Certification Labs (ECL), the Freescale
MPC7447A. This full-chip processor consumes 21W
(typical) of power [Freescale website, April 2005].
While the 7447A PowerPC chip includes area and
power for integrated memories and I/Os that contribute
to the 184X greater power dissipation, even allowing
a generous 40% of the chip area and power to these
memories and I/Os, the Xtensa LX processor enjoys
a more than 100X advantage in both area and power
consumption.
Code Density : The Xtensa LX code
size for the EEMBC Network V2 benchmark has been
certified by ECL at 65,208 bytes. The Freescale
MPC7447A code size is certified at 280,984 bytes.
Tensilica's Xtensa LX has a 4X advantage in code
size.
Communication Capabilities: The Xtensa
LX processor has unique Queues that allow the designer
to bypass the bus entirely, thereby increasing
throughput (see discussion of Queues below).
Performance : On a per-MHz
basis, the Xtensa LX outperforms the closest competitors
-Freescale MPC7447A on the TCPmark of the EEMBC
benchmark and the IBM 750GX on the IPmark - by
nearly a 3X margin.
EEMBC Results
The normalized (per MHz) EEMBC TCPmark
test scores are:
- 1.62434 - Xtensa LX Optimized
- 0.4671 - PowerPC 760GX
- 0.5856 - PowerPC MCP7447A
- 0.33762 - Xtensa LX Out of the Box
The normalized (by MHz) EEMBC IPmark
test scores are:
- 0.82138 - Xtensa LX Optimized
- 0.2861 - PowerPC 760GX
- 0.1818 - Xtensa LX Out of the Box
- 0.1751 - PowerPC MCP7447A
(Because EEMBC scores for licensable
synthesizable processors, such as the Xtensa LX,
are expressed on a "per-MHz" basis, the PowerPC
results were normalized to a "per-MHz" basis for
this comparison.)
With the Networking 2.0 benchmark,
EEMBC simulates real-world networking performance
with many different users and differing traffic
types. The TCPmark represents processor performance
in Internet-enabled, client-side devices. The IPmark
represents processor performance in network routers,
gateways and switches.
The total code size (aggregate total
of bytes of object code) for all twelve benchmark
kernels in the Networking Version 2 suite are
- 65208 bytes - Xtensa LX Optimized
- 67256 bytes - Xtensa LX Out of the Box
- 255,764 bytes - PowerPC 760GX
- 280,984 bytes - PowerPC MCP7447A
How Tensilica Achieved These
Outstanding Results
Tensilica made extensive use of custom
FLIX (Flexible Length Instruction Xtensions) instructions
in the processor configuration tested by ECL. The
tested configuration included seven different 64-bit
instruction word formats with up to eight parallel
operation slots. FLIX is a technology introduced
with the Xtensa LX processor that delivers VLIW-style
parallel execution without the "code bloat" typically
incurred by VLIW-style processors. In fact, the
dramatic 4X to 5X speedup achieved by the Optimized
Xtensa LX score versus the Out of the Box Xtensa
LX score was accompanied by a decrease of total
code size of nearly 2%.
In addition to the benefits of FLIX
parallelization, which provided application acceleration
across all of the 12 benchmark kernels in the EEMBC
Networking Version 2 suite of benchmarks, Tensilica
selectively employed user-defined TIE (Tensilica
Instruction Extension) Queues to dramatically accelerate
the IP packet check kernels.
Tensilica's unique user-defined Queue
capability allows SOC designers to bypass the standard
processor bus and directly import data into the
execution units of an Xtensa LX processor, much
in the same way that a dedicated hardware accelerator
block would process data in an SOC design. Whereas
conventional processors are limited to a maximum
data throughput of one 32-bit or 64-bit data read
or write every clock cycle [and hence a typical
maximum sustainable throughput on streaming network
data of one third or less of the peak transfer
rate, assuming a read-compute-write-repeat sequence],
Xtensa processors with Queues can sustain data
rates of one transfer every clock cycle for every
Queue port, and with a user-defined bandwidth of
up to 1024 bits per cycle. And Tensilica's
patented processor generator technology automatically
delivers full C compiler and Instruction Set Simulator
support for user-defined Queues.
Custom instructions in an Xtensa
LX processor can perform multiple queue operations
per cycle, perhaps combining inputs from two input
queues with local data and sending the computed
values to two output queues. The high bandwidth
and low control overhead of Queues allows the Xtensa
LX processor to be used in applications with extreme
data rates. IP Packet manipulation in embedded
networking devices is a prime example of such a
use of TIE Queues. In an SOC design, a network
engineer would normally design custom packet header
inspection hardware in order to achieve high throughput
processing of packets. Using a conventional processor,
too many clock cycles are required to first read
in a full packet and then perform the required
header inspection and checksum calculations to
be able to sustain the throughput rates required
of Gigabit and 10Gigabit systems. Thus custom "accelerator" or "data
plane" hardware is designed to offload the conventional
control processor.
But with Xtensa LX processors, the
custom packet-processing hardware and the control
interfaces to ingress and egress channel packet-buffer
queues can be integrated into the processor. The
result: a stunning 33X speedup of the Xtensa LX
on the IP Packet Check portion of the benchmark. To
equal the level of performance of the 304 MHz Xtensa
LX on the 1MB packet size kernel, the PowerPC would
have to run at 6.4 GHz. And, this processor-based
design approach is far less work for the SOC hardware
team. With Tensilica's patented technology,
the Queue interfaces and custom packet-header inspection
instructions can be added to a processor within
hours, complete with fully verified RTL and software
tools and models. Conventional RTL hardware
design requires weeks of RTL design followed by
months of verification.
Tensilica's Xtensa LX processor is
the only processor that allows designers to bypass
the conventional processor-bus-bottleneck in this
way. Every other processor requires that data be "fed" to
it over a bus, which is inherently much slower.
Xtensa Queues provide a high-speed mechanism to
transfer streaming data. Input queues and output
queues operate to the programmer's viewpoint like
traditional processor registers - with the notable
exception that data is always available without
the need to load or store the data before and after
computation.
"By using Xtensa Queues, a
standard capability with our Xtensa LX processor,
we were able to get performance that outperforms
every other processor that has ever published EEMBC
Networking 2.0 performance data," stated Steve
Roddy, vice president of marketing for Tensilica. "Networking
customers looking for RTL-equivalent data transfer
speeds can use Xtensa LX processors and benefit
from using a programmable, rather than fixed, function
solution."
About EEMBC
EEMBC, the Embedded Microprocessor
Benchmark Consortium, develops and certifies real-world
benchmarks and benchmark scores to help designers
select the right embedded processors for their
systems. Every processor submitted for EEMBC benchmarking
is tested for parameters representing different
workloads and capabilities in communications, networking,
consumer, office automation, automotive/industrial,
embedded Java, and microcontroller-related applications.
With members including leading semiconductor, intellectual
property, and compiler companies, EEMBC establishes
benchmark standards and provides certified benchmarking
results through the EEMBC Certification Labs (ECL).
About Tensilica
Tensilica was founded in July 1997
to address the growing need for optimized, application-specific
microprocessor solutions in high-volume embedded
applications. With a configurable and extensible
microprocessor core called Xtensa, Tensilica is
the only company that has automated and patented
the time-consuming process of generating a customized
microprocessor core along with a complete software
development tool environment, producing new configurations
in a matter of hours. For more information, visit www.tensilica.com .
# # #
Editors'
Notes:
- Tensilica and Xtensa are
registered trademarks belonging to Tensilica,
Inc. All other company and product names are
trademarks and/or registered trademarks of their
respective owners.
- Tensilica's announced
licensees include Agilent, ALPS, AMCC (JNI
Corporation), Astute Networks, ATI, Avision,
Bay Microsystems, Berkeley Wireless Research
Center, Broadcom, Cisco Systems, Conexant Systems,
Cypress, Crimson Microsystems, ETRI, FUJIFILM
Microdevices, Fujitsu Ltd., Hudson Soft, Hughes
Network Systems, Ikanos Communications, LG
Electronics, Marvell, NEC Laboratories America,
NEC Corporation, NetEffect, Neterion, Nippon
Telephone and Telegraph (NTT), NVIDIA, Olympus
Optical Co. Ltd., sci-worx, Seiko Epson, Solid
State Systems, Sony, STMicroelectronics, Stretch,
TranSwitch Corporation, and Victor Company
of Japan (JVC)
|