Tech Support | Generator Login | Careers | Contact Us
NEWS & EVENTS

  Overview

  Press Releases

  + 2008

  + 2007

  + 2006

  + 2005

  + 2004

  + 2003

  + 2002

  + 2001

  + 2000

  + 1999

  Articles

  Events

  Presentations

  Books

  Press Room

May 16, 2005

Tensilica Xtensa LX Processor Tops EEMBC Networking 2.0 Benchmarks

Xtensa LX processor first licensable core to certify results on newest EEMBC benchmark

Xtensa LX Beats PowerPC Full-Chip Results

Santa Clara, Calif. - May 16, 2005 - Tensilica , Inc., the only company to automate the design of optimized application-specific configurable processors for system-on-chip (SOC) design, today announced that it has achieved the highest score ever reported on the Networking Version 2.0 benchmark suite of the Embedded Microprocessor Benchmark Consortium (EEMBC). Tensilica's Xtensa LX processor is the first licensable processor core to complete certification on this challenging benchmark suite. EEMBC benchmark scores, based on simulation, show that an optimized Xtensa LX processor core is significantly faster on a per-MHz basis than the only two other processors certified to date, the 1GHz PowerPC 750GX and 1.4 GHz PowerPC MPC7447A, both of which are full-chip, standard product processors. The Xtensa LX processor delivers this outstanding performance while simultaneously delivering a 4X code density advantage and more than a 100X advantage in both die area and power dissipation. 

Multi-Core ASSP/ASIC Design Benefits

All of today's leading edge ASSP and ASIC designs, and a growing number of general-purpose processor designs, employ multiple specialized processing engines on chip, particularly in networking applications and now, even in consumer designs. Examples range from Cisco's performance-leading CRS-1 terabit router, which relies upon the innovative Cisco-designed Silicon Packet Processor built with 188 Tensilica Xtensa processor cores, to the recently announced Playstation Cell processor and to the emerging "dual-core" war in the desktop PC market.

The key attributes needed in a processor core used in a multi-core architecture are: small physical size and low-power (to maximize the number of cores per chip); excellent code density (to minimize the area needed for local instruction and data memories attached to each processor core); communication infrastructure and capabilities (to quickly transfer data); and outstanding application-specific or function-specific performance (so that each core in the design can be dedicated to a specific type of task).

The EEMBC Networking V2 results demonstrate that the Xtensa LX core excels in all four key attributes. [Note that Tensilica's results are for a single Xtensa LX processor core in a configuration that is representative of how it could be used in an SOC design for a networking application.]

Size & Power : The Xtensa LX processor configuration consumes a mere 1.2 square mm in a reference high-performance 130 nm process technology, using conventional standard-cell implementation techniques (excluding memory area). This core is projected to consume an estimated 115 milli-watts of power when operated at its maximum 304 MHz operating frequency. Contrast that miserly power figure to that of the leading full-chip processor certified by EEMBC Certification Labs (ECL), the Freescale MPC7447A. This full-chip processor consumes 21W (typical) of power [Freescale website, April 2005]. While the 7447A PowerPC chip includes area and power for integrated memories and I/Os that contribute to the 184X greater power dissipation, even allowing a generous 40% of the chip area and power to these memories and I/Os, the Xtensa LX processor enjoys a more than 100X advantage in both area and power consumption.

Code Density : The Xtensa LX code size for the EEMBC Network V2 benchmark has been certified by ECL at 65,208 bytes. The Freescale MPC7447A code size is certified at 280,984 bytes. Tensilica's Xtensa LX has a 4X advantage in code size.

Communication Capabilities: The Xtensa LX processor has unique Queues that allow the designer to bypass the bus entirely, thereby increasing throughput (see discussion of Queues below).

Performance :  On a per-MHz basis, the Xtensa LX outperforms the closest competitors -Freescale MPC7447A on the TCPmark of the EEMBC benchmark and the IBM 750GX on the IPmark - by nearly a 3X margin.

EEMBC Results

The normalized (per MHz) EEMBC TCPmark test scores are:

  • 1.62434 - Xtensa LX Optimized
  • 0.4671 - PowerPC 760GX
  • 0.5856 - PowerPC MCP7447A
  • 0.33762 - Xtensa LX Out of the Box

The normalized (by MHz) EEMBC IPmark test scores are:

  • 0.82138 - Xtensa LX Optimized
  • 0.2861 - PowerPC 760GX
  • 0.1818 - Xtensa LX Out of the Box
  • 0.1751 - PowerPC MCP7447A

(Because EEMBC scores for licensable synthesizable processors, such as the Xtensa LX, are expressed on a "per-MHz" basis, the PowerPC results were normalized to a "per-MHz" basis for this comparison.)

With the Networking 2.0 benchmark, EEMBC simulates real-world networking performance with many different users and differing traffic types. The TCPmark represents processor performance in Internet-enabled, client-side devices. The IPmark represents processor performance in network routers, gateways and switches.

The total code size (aggregate total of bytes of object code) for all twelve benchmark kernels in the Networking Version 2 suite are

  • 65208 bytes - Xtensa LX Optimized
  • 67256 bytes - Xtensa LX Out of the Box
  • 255,764 bytes - PowerPC 760GX
  • 280,984 bytes - PowerPC MCP7447A

How Tensilica Achieved These Outstanding Results

Tensilica made extensive use of custom FLIX (Flexible Length Instruction Xtensions) instructions in the processor configuration tested by ECL. The tested configuration included seven different 64-bit instruction word formats with up to eight parallel operation slots. FLIX is a technology introduced with the Xtensa LX processor that delivers VLIW-style parallel execution without the "code bloat" typically incurred by VLIW-style processors. In fact, the dramatic 4X to 5X speedup achieved by the Optimized Xtensa LX score versus the Out of the Box Xtensa LX score was accompanied by a decrease of total code size of nearly 2%. 

In addition to the benefits of FLIX parallelization, which provided application acceleration across all of the 12 benchmark kernels in the EEMBC Networking Version 2 suite of benchmarks, Tensilica selectively employed user-defined TIE (Tensilica Instruction Extension) Queues to dramatically accelerate the IP packet check kernels. 

Tensilica's unique user-defined Queue capability allows SOC designers to bypass the standard processor bus and directly import data into the execution units of an Xtensa LX processor, much in the same way that a dedicated hardware accelerator block would process data in an SOC design. Whereas conventional processors are limited to a maximum data throughput of one 32-bit or 64-bit data read or write every clock cycle [and hence a typical maximum sustainable throughput on streaming network data of one third or less of the peak transfer rate, assuming a read-compute-write-repeat sequence], Xtensa processors with Queues can sustain data rates of one transfer every clock cycle for every Queue port, and with a user-defined bandwidth of up to 1024 bits per cycle.  And Tensilica's patented processor generator technology automatically delivers full C compiler and Instruction Set Simulator support for user-defined Queues.

Custom instructions in an Xtensa LX processor can perform multiple queue operations per cycle, perhaps combining inputs from two input queues with local data and sending the computed values to two output queues. The high bandwidth and low control overhead of Queues allows the Xtensa LX processor to be used in applications with extreme data rates. IP Packet manipulation in embedded networking devices is a prime example of such a use of TIE Queues.  In an SOC design, a network engineer would normally design custom packet header inspection hardware in order to achieve high throughput processing of packets. Using a conventional processor, too many clock cycles are required to first read in a full packet and then perform the required header inspection and checksum calculations to be able to sustain the throughput rates required of Gigabit and 10Gigabit systems. Thus custom "accelerator" or "data plane" hardware is designed to offload the conventional control processor. 

But with Xtensa LX processors, the custom packet-processing hardware and the control interfaces to ingress and egress channel packet-buffer queues can be integrated into the processor. The result: a stunning 33X speedup of the Xtensa LX on the IP Packet Check portion of the benchmark.  To equal the level of performance of the 304 MHz Xtensa LX on the 1MB packet size kernel, the PowerPC would have to run at 6.4 GHz. And, this processor-based design approach is far less work for the SOC hardware team.  With Tensilica's patented technology, the Queue interfaces and custom packet-header inspection instructions can be added to a processor within hours, complete with fully verified RTL and software tools and models.  Conventional RTL hardware design requires weeks of RTL design followed by months of verification.

Tensilica's Xtensa LX processor is the only processor that allows designers to bypass the conventional processor-bus-bottleneck in this way. Every other processor requires that data be "fed" to it over a bus, which is inherently much slower. Xtensa Queues provide a high-speed mechanism to transfer streaming data. Input queues and output queues operate to the programmer's viewpoint like traditional processor registers - with the notable exception that data is always available without the need to load or store the data before and after computation.

 "By using Xtensa Queues, a standard capability with our Xtensa LX processor, we were able to get performance that outperforms every other processor that has ever published EEMBC Networking 2.0 performance data," stated Steve Roddy, vice president of marketing for Tensilica. "Networking customers looking for RTL-equivalent data transfer speeds can use Xtensa LX processors and benefit from using a programmable, rather than fixed, function solution."

About EEMBC

EEMBC, the Embedded Microprocessor Benchmark Consortium, develops and certifies real-world benchmarks and benchmark scores to help designers select the right embedded processors for their systems. Every processor submitted for EEMBC benchmarking is tested for parameters representing different workloads and capabilities in communications, networking, consumer, office automation, automotive/industrial, embedded Java, and microcontroller-related applications. With members including leading semiconductor, intellectual property, and compiler companies, EEMBC establishes benchmark standards and provides certified benchmarking results through the EEMBC Certification Labs (ECL).

About Tensilica

Tensilica was founded in July 1997 to address the growing need for optimized, application-specific microprocessor solutions in high-volume embedded applications. With a configurable and extensible microprocessor core called Xtensa, Tensilica is the only company that has automated and patented the time-consuming process of generating a customized microprocessor core along with a complete software development tool environment, producing new configurations in a matter of hours. For more information, visit www.tensilica.com .

# # #

Editors' Notes:

  • Tensilica and Xtensa are registered trademarks belonging to Tensilica, Inc. All other company and product names are trademarks and/or registered trademarks of their respective owners.
  • Tensilica's announced licensees include Agilent, ALPS, AMCC (JNI Corporation), Astute Networks, ATI, Avision, Bay Microsystems, Berkeley Wireless Research Center, Broadcom, Cisco Systems, Conexant Systems, Cypress, Crimson Microsystems, ETRI, FUJIFILM Microdevices, Fujitsu Ltd., Hudson Soft, Hughes Network Systems, Ikanos Communications, LG Electronics, Marvell, NEC Laboratories America, NEC Corporation, NetEffect, Neterion, Nippon Telephone and Telegraph (NTT), NVIDIA, Olympus Optical Co. Ltd., sci-worx, Seiko Epson, Solid State Systems, Sony, STMicroelectronics, Stretch, TranSwitch Corporation, and Victor Company of Japan (JVC)
SOC book
RECOGNITION
Red herring top 100
Portable Design 2006 Editor's  Choice Award
Best Processor Cores of 2004
EDN's Hot 100 Products of 2006
QUOTABLE

“It is faster and easier to design complex SOCs using Xtensa configurable processors - especially when using the XPRES Compiler - than to hand-code complex SOC design elements in hardware using traditional RTL methods. Plus the Xtensa processors are programmable, so it will be valuable for future products and applications.”

- Katsuhiko Nishizawa, general manager of the IJP Design Department of the Imaging Products Operations Division of Seiko Epson Corporation.