Tensilica Unveils Groundbreaking Next-Generation
Xtensa LX Processor Core
Industry's Highest Performance Core Will
Replace RTL in SOC Designs
SANTA
CLARA, Calif. - May 18, 2004 - Tensilica,
Inc. today unveiled its next-generation Xtensa
LX configurable processor, the highest performance
processor core on the market, featuring both
higher computational throughput and dramatically
higher I/O (input/output) bandwidth. This record-breaking
performance, combined with Tensilica's patented
automated design and development environment,
makes Xtensa LX the only processor fast and
flexible enough to replace register transfer
logic (RTL) design methodologies in system-on-chip
(SOC) designs, leading to reduced development
time and risk along with dramatic increases
in ROI (return on investment) for semiconductor
and systems companies. Xtensa LX is also ideally
suited as a traditional control processor in
embedded applications. Tensilica expects that
most of its customers will use multiple Xtensa
LX cores in each SOC design, each tailored
to speed a different part of the customer's
application.
"With chip development costs now
surging past $10 million, SOC development teams
need to reduce project development time, risk and
cost," said Chris Rowen, president and CEO of Tensilica. "With
the Xtensa LX processor, designers can configure
optimized processors specifically tuned to their
application in a fraction of the time that it takes
to design and verify RTL, with comparable computational
and I/O performance. The inherent programmability
of the processor gives designers the flexibility
to fix bugs and add features purely in software
at any point - late in the design cycle or long
after first shipment. This is impossible
with hard-coded RTL."
The Xtensa LX processor core features
significant innovations in four key areas:
- Lower power, a key requirement for all SOC
designs;
- Improved I/O throughput, so the processor can
move data in and out at terabit/second speeds;
- Improved compute performance, so the processor
can process complex algorithms much faster; and
- Better interfaces for on-chip memories, so
the processor isn't slowed down by memory access
speeds.
Tensilica supports these technical
innovations with a patented development environment
that automatically and simultaneously generates
an optimized hardware implementation, a corresponding
tailored software tool chain, and a complete set
of EDA models and scripts. Configuration
and extension choices made by the designer to address
requirements for a given application are immediately
and automatically reflected in the entire software
tool chain. With alternative approaches, this is
typically a manual, error-prone task that requires
extensive verification.
Lower Power Consumption
Tensilica has automated the insertion
of fine-grain clock gating for every functional
element of the Xtensa LX processor including functions
conceived of and created by the designer. Clock
gating is a very effective power reduction technique
that shuts down the power to parts of the logic
that are not in use on a particular clock cycle.
Because automatic insertion of clock gating is
only available for restricted RTL design coding
styles, manual, error-prone post-layout tuning
of clock circuits is often required for standard
RTL design.
The Xtensa LX processor's new architecture
dramatically lowers power consumption in large
configurations with many designer-defined functions.
But even without designer modification, the Xtensa
LX processor is designed to use power very efficiently.
The minimum configuration of the Xtensa LX processor
dissipates a miserly 0.05 mW/MHz in a representative
130 nm process technology. By comparison,
the smallest member of the ARM synthesizable processor
family, the ARM7TDMI-S, burns 0.11 mW/MHz in 130
nm technology - twice the power consumption of
the Xtensa LX.
I/O Throughput Improved By Three Orders
of Magnitude
Two major innovations improve I/O
throughput in Xtensa LX processors: an option for
a second load/store unit and designer-defined ports
and queues.
Designers using the Xtensa LX processor
can choose one or two 128-bit wide load/store units.
Most standard embedded processors have only a single
narrow (32- or 64-bit) load/store unit. However,
many applications benefit from two load/store units
for data-intensive inner loops - a standard feature
of many high-end DSP processors. The Xtensa LX
processor's optional second load/store unit provides
greater sustained general-purpose I/O bandwidth
and an XY-style memory access for DSP applications.
Additionally, at 128 bits, it's much wider and
can accommodate much more data than standard load/store
units.
The true breakthrough in I/O is the
capability to add designer-defined ports and queues,
which allow the Xtensa LX processor to communicate
as fast and as flexibly as RTL blocks. Ports are
wires that directly connect two Xtensa LX processors
or an Xtensa LX processor to external RTL. Port
connections can be arbitrarily wide, allowing wide
data types to be transferred easily without the
need for multiple load/store operations. As many
as one million signals (1024 1024-bit-wide ports)
can be used, and while this is an outrageous number,
far exceeding the performance demands of real systems
today (providing 350 terabits/sec of direct data
flow per processor in a 130 nm CMOS process), this
clearly demonstrates that old notions of the I/O
bottlenecks inherent in a processor-based solution
are now obsolete.
While ports are ideal to quickly
convey control and status information, queues provide
a high-speed mechanism to transfer streaming data.
Input queues and output queues operate to the programmer's
viewpoint like traditional processor registers
- with the notable exception that data is always
available without the need to load or store the
data before and after computation. Queues can sustain
data rates as high as one transfer every clock
cycle or over 350 Gbits/sec for each queue added
to an Xtensa LX processor. Custom instructions
can perform multiple queue operations per cycle,
perhaps combining inputs from two input queues
with local data and sending the computed values
to two output queues. The high bandwidth and low
control overhead of queues allows the Xtensa LX
processor to be used in applications with extreme
data rates.
Ports and queues specified by the
designer are automatically added to the Xtensa
LX processor and are 100% fully modeled by Tensilica's
Xtensa Processor Generator. The full behavior of
the port or queue, just like any other modification
made to the Xtensa LX processor, is automatically
reflected in the custom software development tools,
instruction set simulator, bus functional model
and EDA scripts - within about an hour. And because
it's automated using Tensilica's patented technology,
it's pre-verified and correct by construction -
no need to re-verify the processor.
Improved Compute Performance
Tensilica improved compute performance
in the Xtensa LX processor through its innovative
FLIX (Flexible Length Instruction Xtensions) architecture.
The FLIX architecture is a highly efficient implementation
of the Xtensa instruction set architecture (ISA)
that gives designers more options for cost/performance
tradeoffs. The FLIX technology provides the
flexibility to freely and modelessly intermix instructions
of various lengths (16-, 24-, or 32-/64-bit). By
packing multiple operations into a wide 32- or
64-bit instruction word, FLIX technology allows
designers to accelerate a broader class of "hot
spots" in embedded applications. FLIX eliminates
the performance and code-size drawbacks that can
occur when using a one-size-fits-all instruction
length. Compared to rigid, high-performance
processor designs that either encode only one RISC
operation per instruction or use ultra-wide 64b/128b/256b
VLIW (very long instruction word) formats, FLIX
delivers high-performance concurrent execution
exactly and only when needed, yet preserves the
industry leading code density advantages of the
Xtensa processor's native 16b/24b base architecture
instruction formats.
Better Interfaces to On-Chip Memories
To address the growing speed disparity
between standard cell logic and memories (memory
access speeds have not scaled as well as logic
in the migration from 180 nm to 130 nm and now
90 nm), the Xtensa LX processor features a configurable
pipeline. Designers can select two additional clock
cycles for memory access if required by the application.
While Tensilica's traditional 5-stage pipeline
is very efficient for many applications, designers
employing very large local memories or low-power
memories with slower access speeds will find advantages
in moving to a longer pipeline, resulting in a
higher system clock frequency.
Leading Benchmark Scores
In addition to being the ideal alternative
methodology for hardware block design, the Xtensa
LX processor excels at traditional CPU and DSP
tasks in embedded SOCs as demonstrated by industry
leading benchmark results on the EEMBC (Embedded
Microprocessor Benchmark Consortium) Consumer benchmark
suite and the BDTI Benchmarks TM by Berkeley Design
Technology, Inc. (BDTI).
The EEMBC Consumer benchmark "out
of the box" score was 171.6 @ 330 MHz (0.51997
per MHz), nearly a 9X performance advantage over
the ARM1020E. See separate press release issued
today titled, "Tensilica's Xtensa LX Processor
Beats All Other 32- and 64-bit Processor Cores
on EEMBC Consumer "Out of the Box" Scores."
The Xtensa LX BDTIsimMark2000 TM
score of 6150 for a 370 MHz configuration is 70%
faster than the score for the next-fastest licensable
core benchmarked by BDTI, the CEVA-X1620.* See
separate press release issued today titled, "Tensilica's
New Xtensa LX Processor Earns Top BDTIsimMark2000T
Score."
Specifications
The base Xtensa LX processor consumes
approximately 27,500 gates when synthesized for
minimum power and area, and achieves 350 MHz (worst
case conditions) in TSMC's 130 nm LV process technology
when optimized for speed. In 90nm technology, the
7-stage version of Xtensa can achieve over 500
MHz.
Pricing and Availability
Tensilica's pricing structure is
based on a licensing fee per processor instance
plus royalties based on the volume of processors
manufactured. Each licensed processor instance
can be targeted to any silicon foundry technology.
Licensing fees for a single processor configuration
start at $550,000 for the Xtensa LX processor including
the Vectra LX DSP engine. The Xtensa Software Developers
Toolkit, which includes the Xtensa Xplorer development
environment, Xtensa C/C++ compiler, and Xtensa
Instruction Set Simulator; and TIE Compiler are
priced separately. Customers can begin to take
advantage of the new features of the Xtensa LX
processor early this summer.
Xtensa LX is an addition to the Tensilica
processor family, which includes the proven Xtensa
V configurable processor. Customers will be able
to continue to license the Xtensa V processor.
The Xtensa V processor and the Xtensa LX processor
both implement the common core Xtensa instruction
set.
About Tensilica
Tensilica was founded in July 1997
to address the growing need for optimized, application-specific
microprocessors for high-volume embedded applications.
With the Xtensa and Xtensa LX configurable and
extensible microprocessor cores, Tensilica is the
only company that has automated and patented the
time-consuming process of generating a customized
microprocessor core along with a complete software-development
tool environment, producing new configurations
in a matter of hours. These customized processors
rival hand-coded RTL in performance and add a needed
level of programmability. For more information,
visit www.tensilica.com.
* The BDTIsimMark2000T provides a
summary measure of DSP speed. For more information
and scores see www.BDTI.com. Scores © 2004
BDTI. The Xtensa LX score includes use of 12 custom
TIE instructions that expand the area of the core
by 16%. Licensees may require greater or
lesser degrees of customization. The scores
for all other cores assume that no coprocessors
or other customizations were used. The scores for
the Xtensa LX and all other cores are for worst
case operating conditions in a commercially available
130 nm process. Contact info@BDTI.com for
more information.
# # #
Editors' Notes:
- Tensilica and Xtensa are
registered trademarks belonging to Tensilica
Inc.
- BDTI Benchmarks and BDTIsimMark2000
are trademarks of Berkeley Design Technology,
Inc.
- Tensilica's announced licensees
include Agilent, ALPS, AMCC (JNI Corporation),
Astute Networks, Avision, Bay Microsystems, Berkeley
Wireless Research Center, Broadcom, Cisco Systems,
Conexant Systems, Cypress, Crimson Microsystems,
ETRI, FUJIFILM Microdevices, Fujitsu Ltd., Hudson
Soft, Hughes Network Systems, Ikanos Communications,
LG Electronics, Marvell, MediaWorks, NEC Laboratories
America, NEC Corporation, Nippon Telephone and
Telegraph (NTT), Olympus Optical Co. Ltd., S2io,
Solid State Systems, Sony, STMicroelectronics,
TranSwitch Corporation, and Victor Company of
Japan (JVC)
|