Tensilica Unveils Feature-Rich Third Generation
Xtensa Configurable Processor Technology
New Add-on Options Bring Configurable DSP Technology
to System Designers
Santa Jose, Cal., June 14,
2000 … At the Embedded Processor
Forum, Tensilica Inc., the Santa Clara-based
provider of application-specific processor technology,
announced major design improvements and new options
to the company's unique Xtensa architecture and
intellectual property suite. Xtensa III, the
third generation of the company's breakthrough
technology, includes more complete configurability
in hardware and software, more powerful features
within the Xtensa architecture, and seamless
integration of new DSP, control and media processor
capabilities into the system-on-chip (SOC) environment.
Included in the announcement were three powerful
preconfigured coprocessor options including the
Vectra DSP, Floating-Point Unit and a 32-bit multiplier.
Rounding out the announcement was more complete
automation for designer-defined coprocessors using
an upgraded Tensilica Instruction Extension ("TIE")
Compiler, and the automated configurability of
system development environments and leading third
party RTOSes.
Tensilica president and CEO, Chris Rowen, said "The
advent of Xtensa III signals the 'coming of age'
of configurable processors. We are introducing
a suite of ever more powerful features into the
Xtensa environment having capabilities that allow
designers to span virtually all of the critical
protocol, signal, image processing tasks in addition
to the traditional control paths that 32-bit processors
have been assigned in the embedded space. "The
basis for the new coprocessors," Rowen continued, "is
the powerful capability provided by the enhanced
TIE compiler that allows the designer to build
his own system-specific solutions."
Of the new preconfigured options introduced by
the company, most significant is a powerful new
DSP coprocessor dubbed "Vectra". In addition,
the firm announced the availability of a floating
point unit (FPU), and a 32 x 32-bit multiply option.
With the exception of the Vectra DSP coprocessor,
the other options are included in the basic Xtensa
license fee.
New Options available with Xtensa III
Vectra DSP Co-processor
The Vectra DSP Option is optimized to handle high-performance digital signal
processing applications using fixed-point arithmetic. As such, this option
is ideal for communications, audio and imaging applications employing a highly
efficient and easy to program vector architecture. Vectra provides high data
throughput, low power dissipation and the best DSP performance per watt and
per area of any of today's core processors for SOC embedded applications. It
can be quickly configured for 8, 16 and 24-bit fixed-point applications .
With Vectra, designers now have - for the first
time - a single core architecture that can be rapidly
configured to satisfy all of the requirements of
embedded processing: control, protocol, signal
and image processing. Like all Xtensa options,
Vectra is fully-supported by software development
tools that include vectorizing compilers, assemblers,
simulators, RTOSes and optimized libraries for
popular DSP functions.
Vectra's Principal Features Are:
Worst-case 0.18-micron performance: 200 MHz
Outstanding Power Efficiency: 0.8 mW/MHz at 0.18-micron, 1.8 V.; 40 µW/MMAC
in 0.15-micron, 0.9 V
Minimum Die Footprint: 2.9 - 3.5 sq. mm including the Xtensa base processor
in 0.18-micron
Powerful high bandwidth, low power vector register combination:
16 extended precision vector registers (64 40-bit accumulators) 16 16-bit scalar
registers, 4 128-bit alignment registers
128-bit paths between vector memory and resident RAM
Complete array of Tensilica and third-party software tools including compilers,
debuggers, cycle-accurate simulators and CPLD-based emulation kits
Vectorizing compiler that enables full performance from scalar C/C++
Optimized FFT, FIR and Viterbi libraries
Full support for popular RTOS environments including WindRiver Tornado and
ATI Nucleus Plus
Tensilica chose a Single Instruction Multiple Data (SIMD) approach for the
Vectra DSP coprocessor. This architecture results in a boost in bandwidth and
reduced power requirements by moving all critical data closer to where it is
needed, the CPU. The vector register memory supports over 32-bytes per cycle
source operand bandwidth and 16 bytes per cycle for local memory to vector
file transfers. Ample local registers mean higher code efficiency (Vectra supports
radix-4 FFT with 50% fewer loads and stores, 15% fewer ops than radix-2 FFT.)
Vectra's overall performance is better than 2-7x the performance level of the
typical "dual MAC" DSP.
Vectra Performance Summary:
4 multiply-adds or 8 adds per cycle
4 FIR taps per cycle
Viterbi butterfly in 2 cycles - GSM Viterbi state metric update in 5,180 cycles
High-pass vocoder filter achieves 8 cycles/point
256-point complex FFT in 2563 cycles, 1024-point complex FFT in 11,873 cycles
Vector register memory sustains > 6.4 GB per cycle source operand bandwidth
The first implementation of the Vectra DSP Option will be available at the
end of 3Q00, with the fully scalable and configurable version becoming available
in 1Q01. The license fee for use of this option within a core instantiation
is $150,000.
Floating Point Unit
The Xtensa III release includes a 32-bit single precision floating point coprocessor
option optimized for printing, graphics and audio applications. The principal
objective guiding the design of this coprocessor was to provide the programming
ease of floating point at the cost of fixed point processing. It adds the
logic and architectural components needed for IEEE 754 single-precision floating-point
operations.
Major Features:
- 16 dedicated floating point registers
- Full set of load/stores, offset and indexed
address update modes
- Fully pipelined arithmetic operations in hardware:
- add, sub, mul, madd, msub 4-cycle latency
- loads and converts: 2-cycle latency
- moves, compares 1-cycle latency
- Full compiler support C/C++ float
Performance (0.18-micron, 1.8v):
- Adds 20-25K gates to base processor for a
total of 1.2 -1.5 square mm total core area.
- Sustains 2 FLOPs/cycle = 400 MFLOPs in 0.18-micron.
32 x 32-bit Multiply Option
Xtensa III adds a new MUL 32 option to the library
of MAC 16 (16x16 Multiply with 40-bit accumulator)
and MUL 16 options previously released with Xtensa
I. This option provides two instructions that perform
32x32 multiplication, producing a 64-bit result.
While requiring more area than the MUL 16 option,
at 0.18-micron technology, the added cost becomes
trivial in view of the significantly greater precision
provided.
Enhancements to Xtensa's Basic Processor and
Tools
Besides the powerful new optional functional
units available in Xtensa III, numerous enhancements
have been made to the core processor and tool set.
Some of the more important are:
Core Enhancements
- Set associative caches (2,3, 4-way instruction
and data (I&D) caches: To keep instructions
and data flowing into the execution units, Xtensa
offers various cache size options: 1KB, 2KB,
4KB, 8KB, and 16KB instruction and data caches;
cache line size 16, 32, or 64 bytes. Instruction
and data ROM/RAM options range up to 256KB.
- Instruction and data RAM and ROM now can co-exist
with I&D caches. The RAM and ROM options
provide internal memories that are part of the
processor's address space, and accessed with
the same timing as cache. There are two RAM options:
Instruction RAM and Data RAM, and there are two
ROM options: instruction ROM and Data ROM.
TIE Enhancements
- Boolean registers: 16 1-bit registers provide
for parallel compares and conditional moves.
- Up to four-cycle pipelined instruction capability.
New TIE features include the ability to generate
new instructions that are relaxed to fit into
up to 4 clock cycles for the E stage of the pipeline.
By using the "schedule" directive,
the TIE extension is automatically pipelined,
and the pipeline control and decoding logic is
automatically generated. Extended instructions
that execute in multiple clocks remain fully
pipelined so that an instruction is issued on
every clock. Stall cycles are automatically generated
if a dependent operation is issued. The processor
remains an in-order single-issue machine with
instructions completing in order, one at a time.
- Designer-defined register files: A new TIE
definition enables multiple designer-defined
special register files. A register file can be
of any width (bits per register), restricted
only by the number of bits allocated in the instruction.
The TIE Compiler automatically generates load
and store instructions using the special register
files. Since it handles register allocation for
designer-defined registers, there is no need
for software written in assembly language.
- Wide load/store operations with address update:
TIE language allows for 32/64/128 bit wide load/store
operations for efficient memory bandwidth utilization.
- Register file ID: Designers can specify up
to 8 coprocessor IDs for the set of states associated
with each coprocessor. By associating a coprocessor
ID with each register file, "lazy" save
and restore operations become possible and are
utilized for easy and fast context switching.
Software Enhancements
- Development tools on Windows NT: The Xtensa
processor is delivered with a rich set of software
development tools. These tools are now available
on Windows NT 4.0 as well as Solaris. The tools
include an instruction set simulator and a GNU-based
compiler, linker and assembler.
- Enhanced compiler support: Xtensa's software
development environment is fully integrated with
the processor configuration system, supporting
ANSI C and C++ code with configuration-specific
language extensions. The compiler now allows
the user to add configurable types to support
easy programmability and automatic register allocation
of user-defined coprocessors and register files.
Aggressive optimization includes constant propagation,
common subexpression elimination, loop invariant
code motion, loop unrolling, global data flow
analysis, instruction scheduling, local and global
register allocation and jump optimization.
Bernie Rosenthal, Tensilica's Vice President
of Marketing and Business Development, said "With
Xtensa III, we are significantly adding to the
'firepower' provided by our configurable architecture,
the result of which will be the creation of a broad
variety of SOCs for communication and consumer
applications. We are particularly enthusiastic
about the Vector Integer DSP option. When Vectra
becomes fully configurable, incredible new flexibility
in DSP operations will be available to designers,
and this will continue to accelerate the transition
of configurable processor cores from the province
of early adopters into mainstream design practice."
Price and Availability
Tensilica offers customers two delivery options.
The standard option provides a firm macro in Verilog
or VHDL RTL, and supporting EDA tool scripts, test
suite, placement guidelines and the customized
software tool chain. The ruggedized option provides
a hard macro in the form of a Verilog/VHDL netlist,
GDSII using the target semiconductor vendor's cell
library, a test suite and the software tool chain.
The company's pricing structure is based upon
a licensing fee per instantiated design plus royalties
based upon units manufactured. Licensing fees for
an individually configured processor implementation
and complete software tool environment start at
$350,000. With the Vectra DSP Option, the single
instance fee is $500,000.
About Tensilica
Tensilica was founded in July 1997 to address
the fast-growing market for application-specific
microprocessor cores and software development tools
for high volume, embedded systems. Using the company's
proprietary Xtensa™ Processor Generator,
system-on-a-chip (SOC) designers can develop a
processor subsystem hardware design and a complete
software development tool environment tailored
to their specific requirements in hours.
Tensilica's solutions provide a proven, easy-to-use,
methodology that enables designers to achieve optimum
application performance in minimum design time.
The Company is engaged in research, development,
and customer support from its offices in Santa
Clara, California, Waltham, Massachusetts, Princeton,
N.J., Houston, Texas, Reading, U.K. and Yokohama,
Japan.
Tensilica is headquartered in Santa Clara, California
(95054) at 3255-6 Scott Boulevard, and can be reached
at (408) 986-8000 or via www.tensilica.com on the
World Wide Web.
"Tensilica", "Xtensa" and "Vectra" are
the trademarks belonging to Tensilica Inc. Other
trademarks belong to their respective owners.
|