Xtensa LX2 Tops EEMBC Networking 2.0 Benchmarks
Tensilica’s Xtensa LX processor achieved
the highest score ever reported on the Networking
Version 2.0 benchmark suite of the Embedded Microprocessor
Benchmark Consortium (EEMBC). Tensilica’s
Xtensa LX processor was the first licensable processor
core to complete certification on this challenging
benchmark suite.
EEMBC benchmark scores, based on simulation,
show that an optimized Xtensa LX2 processor core
is significantly faster on a per-MHz basis than
the only two other processors certified to date,
the 1GHz PowerPC® 750GX and 1.4 GHz PowerPC
MPC7447A, both of which are full-chip, standard
product processors. The Xtensa LX2 processor delivers
this outstanding performance while simultaneously
delivering a 4X code density advantage and more
than a 100X advantage in both die area and power
dissipation.
Multi-Core ASSP/ASIC Design Benefits
All of today’s
leading edge ASSP and ASIC designs, and a growing
number of general-purpose processor designs,
employ multiple specialized processing engines
on chip, particularly in networking applications
and now, even in consumer designs. Examples range
from Cisco’s performance-leading CRS-1
terabit router, which relies upon the innovative
Cisco-designed Silicon Packet Processor built
with 188 Tensilica Xtensa processor cores, to
the recently announced Playstation Cell processor
and to the emerging “dual-core” war
in the desktop PC market.
The key attributes needed in a processor core
used in a multi-core architecture are: small
physical size and low-power (to maximize the
number of cores per chip); excellent code density
(to minimize the area needed for local instruction
and data memories attached to each processor
core); communication infrastructure and capabilities
(to quickly transfer data); and outstanding application-specific
or function-specific performance (so that each
core in the design can be dedicated to a specific
type of task).
The EEMBC Networking V2 results demonstrate
that the Xtensa LX core excels in all four key
attributes. [Note that Tensilica’s results
are for a single Xtensa LX processor core in
a configuration that is representative of how
it could be used in an SOC design for a networking
application.]
Size & Power: The Xtensa LX processor configuration
consumes a mere 1.2 square mm in a reference
high-performance 130 nm process technology, using
conventional standard-cell implementation techniques
(excluding memory area). This core is projected
to consume an estimated 115 milli-watts of power
when operated at its maximum 304 MHz operating
frequency. Contrast that miserly power figure
to that of the leading full-chip processor certified
by EEMBC Certification Labs (ECL), the Freescale
MPC7447A. This full-chip processor consumes 21W
(typical) of power [Freescale website, April
2005]. While the 7447A PowerPC chip includes
area and power for integrated memories and I/Os
that contribute to the 184X greater power dissipation,
even allowing a generous 40% of the chip area
and power to these memories and I/Os, the Xtensa
LX processor enjoys a more than 100X advantage
in both area and power consumption.
Code Density: The Xtensa LX code size for the
EEMBC Network V2 benchmark has been certified
by ECL at 65,208 bytes. The Freescale MPC7447A
code size is certified at 280,984 bytes. Tensilica’s
Xtensa LX has a 4X advantage in code size.
Communication Capabilities: The Xtensa LX processor
has unique Queues that allow the designer to
bypass the bus entirely, thereby increasing throughput
(see discussion of Queues below).
Performance: On a per-MHz basis, the Xtensa
LX outperforms the closest competitors – Freescale
MPC7447A on the TCPmark of the EEMBC benchmark
and the IBM 750GX on the IPmark – by nearly
a 3X margin.
EEMBC Results
The normalized (per MHz) EEMBC TCPmark test
scores are:
- 1.62434 – Xtensa LX Optimized
- 0.4671 – PowerPC
760GX
- 0.5856 – PowerPC MCP7447A
- 0.33762 – Xtensa
LX Out of the Box
The normalized (by MHz) EEMBC IPmark test scores
are:
- 0.82138 – Xtensa LX Optimized
- 0.2861 – PowerPC
760GX
- 0.1818 – Xtensa LX Out of the Box
- 0.1751 – PowerPC
MCP7447A
(Because EEMBC scores for licensable synthesizable
processors, such as the Xtensa LX, are expressed
on a “per-MHz” basis, the PowerPC
results were normalized to a “per-MHz” basis
for this comparison.)
With the Networking 2.0 benchmark, EEMBC simulates
real-world networking performance with many different
users and differing traffic types. The TCPmark
represents processor performance in Internet-enabled,
client-side devices. The IPmark represents processor
performance in network routers, gateways and
switches.
The total code size (aggregate total of bytes
of object code) for all twelve benchmark kernels
in the Networking Version 2 suite are
- 65208 bytes – Xtensa
LX Optimized
- 67256 bytes – Xtensa LX Out
of the Box
- 255,764 bytes – PowerPC 760GX
- 280,984 bytes – PowerPC
MCP7447A
How Tensilica Achieved These Outstanding Results
Tensilica made extensive use of custom FLIX
(Flexible Length Instruction Xtensions) instructions
in the processor configuration tested by ECL.
The tested configuration included seven different
64-bit instruction word formats with up to eight
parallel operation slots. FLIX is a technology
introduced with the Xtensa LX processor that
delivers VLIW-style parallel execution without
the “code bloat” typically incurred
by VLIW-style processors. In fact, the dramatic
4X to 5X speedup achieved by the Optimized Xtensa
LX score versus the Out of the Box Xtensa LX
score was accompanied by a decrease of total
code size of nearly 2%.
In addition to the benefits of FLIX parallelization,
which provided application acceleration across
all of the 12 benchmark kernels in the EEMBC
Networking Version 2 suite of benchmarks, Tensilica
selectively employed user-defined TIE (Tensilica
Instruction Extension) Queues to dramatically
accelerate the IP packet check kernels.
Tensilica’s unique user-defined Queue
capability allows SOC designers to bypass the
standard processor bus and directly import data
into the execution units of an Xtensa LX processor,
much in the same way that a dedicated hardware
accelerator block would process data in an SOC
design. Whereas conventional processors are limited
to a maximum data throughput of one 32-bit or
64-bit data read or write every clock cycle [and
hence a typical maximum sustainable throughput
on streaming network data of one third or less
of the peak transfer rate, assuming a read-compute-write-repeat
sequence], Xtensa processors with Queues can
sustain data rates of one transfer every clock
cycle for every Queue port, and with a user-defined
bandwidth of up to 1024 bits per cycle. And Tensilica’s
patented processor generator technology automatically
delivers full C compiler and Instruction Set
Simulator support for user-defined Queues.
Custom instructions in an Xtensa LX2 processor
can perform multiple queue operations per cycle,
perhaps combining inputs from two input queues
with local data and sending the computed values
to two output queues. The high bandwidth and
low control overhead of Queues allows the Xtensa
LX processor to be used in applications with
extreme data rates. IP Packet manipulation in
embedded networking devices is a prime example
of such a use of TIE Queues. In an SOC design,
a network engineer would normally design custom
packet header inspection hardware in order to
achieve high throughput processing of packets.
Using a conventional processor, too many clock
cycles are required to first read in a full packet
and then perform the required header inspection
and checksum calculations to be able to sustain
the throughput rates required of Gigabit and
10Gigabit systems. Thus custom “accelerator” or “data
plane” hardware is designed to offload
the conventional control processor.
But with Xtensa LX2 processors, the custom packet-processing
hardware and the control interfaces to ingress
and egress channel packet-buffer queues can be
integrated into the processor. The result: a
stunning 33X speedup of the Xtensa LX2 on the
IP Packet Check portion of the benchmark. To
equal the level of performance of the 304 MHz
Xtensa LX2 on the 1MB packet size kernel, the
PowerPC would have to run at 6.4 GHz. And, this
processor-based design approach is far less work
for the SOC hardware team. With Tensilica’s
patented technology, the Queue interfaces and
custom packet-header inspection instructions
can be added to a processor within hours, complete
with fully verified RTL and software tools and
models. Conventional RTL hardware design requires
weeks of RTL design followed by months of verification.
Tensilica’s Xtensa LX2 processor is the
only processor that allows designers to bypass
the conventional processor-bus-bottleneck in
this way. Every other processor requires that
data be “fed” to it over a bus, which
is inherently much slower. Xtensa Queues provide
a high-speed mechanism to transfer streaming
data. Input queues and output queues operate
to the programmer’s viewpoint like traditional
processor registers - with the notable exception
that data is always available without the need
to load or store the data before and after computation.
“By using Xtensa Queues, a standard capability
with our Xtensa LX processor, we were able to
get performance that outperforms every other
processor that has ever published EEMBC Networking
2.0 performance data,” stated Steve Roddy,
vice president of marketing for Tensilica. “Networking
customers looking for RTL-equivalent data transfer
speeds can use Xtensa LX processors and benefit
from using a programmable, rather than fixed,
function solution.”
About EEMBC
EEMBC, the Embedded Microprocessor
Benchmark Consortium, develops and certifies
real-world benchmarks and benchmark scores
to help designers select the right embedded processors
for their systems. Every processor submitted
for EEMBC benchmarking is tested for parameters
representing different workloads and capabilities
in communications, networking, consumer, office
automation, automotive/industrial, embedded
Java, and microcontroller-related applications.
With members including leading semiconductor,
intellectual property, and compiler companies,
EEMBC establishes benchmark standards and provides
certified benchmarking results.
|