Tech Support | Generator Login | Careers | Contact Us
PRODUCTS

  Overview

  Technology

  Diamond Standard

  Xtensa

    Configurable

    Config & Extensible

    Xtensa 7

    Xtensa LX2

  + Architecture

    – Features

    – Create TIE

  + I O Bandwidth

  + Low Power

  + Floating Point

  + Real-time Trace

  + Error Detection

  + Benchmarks

    – EEMBC Networking

  + Configuration Options

  + DSP Options

  + White Papers

  DSPs

    HiFi 2 Audio

    Video

    Communications

  HW/SW Dev Tools

  Literature & Doc

EEMBC Networking 2.0 Benchmarks

Xtensa LX2 Tops EEMBC Networking 2.0 Benchmarks

Tensilica’s Xtensa LX processor achieved the highest score ever reported on the Networking Version 2.0 benchmark suite of the Embedded Microprocessor Benchmark Consortium (EEMBC). Tensilica’s Xtensa LX processor was the first licensable processor core to complete certification on this challenging benchmark suite.

EEMBC benchmark scores, based on simulation, show that an optimized Xtensa LX2 processor core is significantly faster on a per-MHz basis than the only two other processors certified to date, the 1GHz PowerPC® 750GX and 1.4 GHz PowerPC MPC7447A, both of which are full-chip, standard product processors. The Xtensa LX2 processor delivers this outstanding performance while simultaneously delivering a 4X code density advantage and more than a 100X advantage in both die area and power dissipation.

Multi-Core ASSP/ASIC Design Benefits

All of today’s leading edge ASSP and ASIC designs, and a growing number of general-purpose processor designs, employ multiple specialized processing engines on chip, particularly in networking applications and now, even in consumer designs. Examples range from Cisco’s performance-leading CRS-1 terabit router, which relies upon the innovative Cisco-designed Silicon Packet Processor built with 188 Tensilica Xtensa processor cores, to the recently announced Playstation Cell processor and to the emerging “dual-core” war in the desktop PC market.

The key attributes needed in a processor core used in a multi-core architecture are: small physical size and low-power (to maximize the number of cores per chip); excellent code density (to minimize the area needed for local instruction and data memories attached to each processor core); communication infrastructure and capabilities (to quickly transfer data); and outstanding application-specific or function-specific performance (so that each core in the design can be dedicated to a specific type of task).

The EEMBC Networking V2 results demonstrate that the Xtensa LX core excels in all four key attributes. [Note that Tensilica’s results are for a single Xtensa LX processor core in a configuration that is representative of how it could be used in an SOC design for a networking application.]

Size & Power: The Xtensa LX processor configuration consumes a mere 1.2 square mm in a reference high-performance 130 nm process technology, using conventional standard-cell implementation techniques (excluding memory area). This core is projected to consume an estimated 115 milli-watts of power when operated at its maximum 304 MHz operating frequency. Contrast that miserly power figure to that of the leading full-chip processor certified by EEMBC Certification Labs (ECL), the Freescale MPC7447A. This full-chip processor consumes 21W (typical) of power [Freescale website, April 2005]. While the 7447A PowerPC chip includes area and power for integrated memories and I/Os that contribute to the 184X greater power dissipation, even allowing a generous 40% of the chip area and power to these memories and I/Os, the Xtensa LX processor enjoys a more than 100X advantage in both area and power consumption.

Code Density: The Xtensa LX code size for the EEMBC Network V2 benchmark has been certified by ECL at 65,208 bytes. The Freescale MPC7447A code size is certified at 280,984 bytes. Tensilica’s Xtensa LX has a 4X advantage in code size.

Communication Capabilities: The Xtensa LX processor has unique Queues that allow the designer to bypass the bus entirely, thereby increasing throughput (see discussion of Queues below).

Performance: On a per-MHz basis, the Xtensa LX outperforms the closest competitors – Freescale MPC7447A on the TCPmark of the EEMBC benchmark and the IBM 750GX on the IPmark – by nearly a 3X margin.

EEMBC Results

The normalized (per MHz) EEMBC TCPmark test scores are:

  • 1.62434 – Xtensa LX Optimized
  • 0.4671 – PowerPC 760GX
  • 0.5856 – PowerPC MCP7447A
  • 0.33762 – Xtensa LX Out of the Box

The normalized (by MHz) EEMBC IPmark test scores are:

  • 0.82138 – Xtensa LX Optimized
  • 0.2861 – PowerPC 760GX
  • 0.1818 – Xtensa LX Out of the Box
  • 0.1751 – PowerPC MCP7447A

(Because EEMBC scores for licensable synthesizable processors, such as the Xtensa LX, are expressed on a “per-MHz” basis, the PowerPC results were normalized to a “per-MHz” basis for this comparison.)

With the Networking 2.0 benchmark, EEMBC simulates real-world networking performance with many different users and differing traffic types. The TCPmark represents processor performance in Internet-enabled, client-side devices. The IPmark represents processor performance in network routers, gateways and switches.

The total code size (aggregate total of bytes of object code) for all twelve benchmark kernels in the Networking Version 2 suite are

  • 65208 bytes – Xtensa LX Optimized
  • 67256 bytes – Xtensa LX Out of the Box
  • 255,764 bytes – PowerPC 760GX
  • 280,984 bytes – PowerPC MCP7447A

How Tensilica Achieved These Outstanding Results

Tensilica made extensive use of custom FLIX (Flexible Length Instruction Xtensions) instructions in the processor configuration tested by ECL. The tested configuration included seven different 64-bit instruction word formats with up to eight parallel operation slots. FLIX is a technology introduced with the Xtensa LX processor that delivers VLIW-style parallel execution without the “code bloat” typically incurred by VLIW-style processors. In fact, the dramatic 4X to 5X speedup achieved by the Optimized Xtensa LX score versus the Out of the Box Xtensa LX score was accompanied by a decrease of total code size of nearly 2%.

In addition to the benefits of FLIX parallelization, which provided application acceleration across all of the 12 benchmark kernels in the EEMBC Networking Version 2 suite of benchmarks, Tensilica selectively employed user-defined TIE (Tensilica Instruction Extension) Queues to dramatically accelerate the IP packet check kernels.

Tensilica’s unique user-defined Queue capability allows SOC designers to bypass the standard processor bus and directly import data into the execution units of an Xtensa LX processor, much in the same way that a dedicated hardware accelerator block would process data in an SOC design. Whereas conventional processors are limited to a maximum data throughput of one 32-bit or 64-bit data read or write every clock cycle [and hence a typical maximum sustainable throughput on streaming network data of one third or less of the peak transfer rate, assuming a read-compute-write-repeat sequence], Xtensa processors with Queues can sustain data rates of one transfer every clock cycle for every Queue port, and with a user-defined bandwidth of up to 1024 bits per cycle. And Tensilica’s patented processor generator technology automatically delivers full C compiler and Instruction Set Simulator support for user-defined Queues.

Custom instructions in an Xtensa LX2 processor can perform multiple queue operations per cycle, perhaps combining inputs from two input queues with local data and sending the computed values to two output queues. The high bandwidth and low control overhead of Queues allows the Xtensa LX processor to be used in applications with extreme data rates. IP Packet manipulation in embedded networking devices is a prime example of such a use of TIE Queues. In an SOC design, a network engineer would normally design custom packet header inspection hardware in order to achieve high throughput processing of packets. Using a conventional processor, too many clock cycles are required to first read in a full packet and then perform the required header inspection and checksum calculations to be able to sustain the throughput rates required of Gigabit and 10Gigabit systems. Thus custom “accelerator” or “data plane” hardware is designed to offload the conventional control processor.

But with Xtensa LX2 processors, the custom packet-processing hardware and the control interfaces to ingress and egress channel packet-buffer queues can be integrated into the processor. The result: a stunning 33X speedup of the Xtensa LX2 on the IP Packet Check portion of the benchmark. To equal the level of performance of the 304 MHz Xtensa LX2 on the 1MB packet size kernel, the PowerPC would have to run at 6.4 GHz. And, this processor-based design approach is far less work for the SOC hardware team. With Tensilica’s patented technology, the Queue interfaces and custom packet-header inspection instructions can be added to a processor within hours, complete with fully verified RTL and software tools and models. Conventional RTL hardware design requires weeks of RTL design followed by months of verification.

Tensilica’s Xtensa LX2 processor is the only processor that allows designers to bypass the conventional processor-bus-bottleneck in this way. Every other processor requires that data be “fed” to it over a bus, which is inherently much slower. Xtensa Queues provide a high-speed mechanism to transfer streaming data. Input queues and output queues operate to the programmer’s viewpoint like traditional processor registers - with the notable exception that data is always available without the need to load or store the data before and after computation.

“By using Xtensa Queues, a standard capability with our Xtensa LX processor, we were able to get performance that outperforms every other processor that has ever published EEMBC Networking 2.0 performance data,” stated Steve Roddy, vice president of marketing for Tensilica. “Networking customers looking for RTL-equivalent data transfer speeds can use Xtensa LX processors and benefit from using a programmable, rather than fixed, function solution.”

About EEMBC

EEMBC, the Embedded Microprocessor Benchmark Consortium, develops and certifies real-world benchmarks and benchmark scores to help designers select the right embedded processors for their systems. Every processor submitted for EEMBC benchmarking is tested for parameters representing different workloads and capabilities in communications, networking, consumer, office automation, automotive/industrial, embedded Java, and microcontroller-related applications. With members including leading semiconductor, intellectual property, and compiler companies, EEMBC establishes benchmark standards and provides certified benchmarking results.

CORE OF THE YEAR
Best Processor Cores of 2004
PRODUCT RESOURCES
Xtensa LX2 Product Brief
Xtensa Processor Developers Toolkit Product Brief
Microprocessor Report’s review of Xtensa LX
  Microprocessor Report's Update on Xtensa LX2 and Xtensa 7
BDTI’s Report on Tensilica Xtensa LX Processor with Vectra LX
  EEMBC Benchmarks
  BDTI Benchmarks
  Epson printer
WHITE PAPERS
FLIX: Fast Relief for Performance-Hungry Applications
XPRES Compiler
Automated Configurable Processor Design Flow
  more >

ARTICLES

Hit Performance Goals with Configurable Processors
FLIX Helps Low-Power CPU Flex its Performance
Compiler Automates RTL Generation
  EDN's 2006 Hot 100 Products
 
QUOTABLE

“Tensilica’s introduction of the Xtensa LX and its revolutionary tool, the XPRES design compiler, made it the clear winner. Even without XPRES, Xtensa LX would be the leading contender for this award, but the combination is unbeatable.”

Tom R. Halfhill,
Senior Analyst, Microprocessor Report

get more information