Xtensa LX2 DSP Options
The Fastest Licensable DSP Core Ever
See Building a Multi-Issue
Vector DSP with Configurable-Processor Technology from GSPx 2004.
See BDTI's independent analysis of the Xtensa LX processor with Vectra LX.
See Microprocessor Report's article Applications Define DSP Speed.
The Xtensa LX2 processor excels at traditional
CPU and DSP tasks in embedded SOCs as demonstrated
by industry leading benchmark results on the BDTI
BenchmarksTM by Berkeley
Design Technology, Inc. (BDTI). The Xtensa LX configurable processor core achieved
the highest score recorded to date for a licensable
processor core -- the Xtensa LX BDTIsimMark2000
score of 6150 at 370 MHz is 70% faster than the
score for the next-fastest licensable core benchmarked
by BDTI, the CEVA-X1620.*
Tensilica’s customers today are already
using the Xtensa processor core for a variety of
DSP tasks including audio processing, image processing,
video processing, and communications channel processing. Additionally, Tensilica offers specialized pre-configured, optimized DSPs for audio and video processing.
The Xtensa LX2 processor can be used in such a wide
variety of applications because Tensilica offers
four separate means of accelerating DSP computations.
| |
Moderate
Performance |
Very
High Performance |
| Function
or Application Specific Designer-Defined
Instructions |
Simple
TIE
- Single-Cycle Instructions
- Single Operation per Instruction
- Compiler Support Through Automatic
Intrinsics
|
Advanced
TIE
- Multiple-Cycle Instructions
- Multiple Operations per Instruction
- SIMD Operations
- Overlapping of Computation and
Load/Store Operations
- Compiler Support Through Automatic
Intrinsics
- Flexible Length Instruction
Xtensions
|
| General-Purpose
Click-Button Configuration Options |
MAC16
Function Unit Option
- Single MAC DSP configuration
option
- Fully supported by the compiler
|
Vectra
LX DSP Engine
- Dual or Quad MAC DSP
- SIMD Instruction Set
- Vectorizing Compiler Support
- 64-bit instruction words
|
|
MAC16 Configuration Option
For moderate intensity signal processing applications,
a 16-bit multiply-accumulate engine can be added
to the base Xtensa LX2 processor core with just
a click of a configuration button in the Xtensa
processor generator. Inclusion of the MAC16 option
adds a full suite of multiply / accumulate instructions
including auto-incrementing loads and combined
multiply-accumulate-load instructions for high
performance computation. These DSP instructions
are also 100% compiler supported.
Tensilica Instruction Extensions (TIE)
For applications with one or more signal processing
applications that require some amount of acceleration
beyond the base RISC processor features of Xtensa
LX2, the designer can quickly add instructions and
hardware execution units tailored to a specific
algorithm.
For example: the “butterfly” operation
used in Convolutional Coding / Viterbi Decoding
applications is a series of combination Add-Compare-Select
(ACS) operations. If the data in question consists
of 8-bit values packed in the standard 32-bit registers
of Xtensa LX2, a designer can easily add an ACS
instruction the Xtensa LX2 processor with a small
incremental block of execution unit hardware to
greatly speed up Viterbi decoding for communications
applications.
Advanced Tensilica Instruction Extensions (TIE)
For applications with well-defined, very high
performance signal processing computational demands,
the TIE language provides a fast means of developing
extremely powerful DSP extensions. Add custom registers
and register files for unique data types. Create
complex multiple-operation instructions and automatically
pipeline those instructions into multi-cycle instructions
by specifying a command directive in the TIE language
that takes only one line of text in a TIE description.
Create SIMD (single instruction, multiple data)
instructions to tackle algorithms with native data
parallelism. Use software-pipelining techniques
to create combined compute-and-load, compute-and-store
instructions for high data-rate applications that
enable continuous computation without the performance
overhead of processor load and store cycles.
FLIX (Flexible Length Instruction Xtensions)
Tensilica improved compute performance in the
Xtensa LX processor through its innovative FLIX
(Flexible Length Instruction Xtensions) architecture.
The FLIX architecture is a highly efficient implementation
of the Xtensa instruction set architecture (ISA)
that gives designers more options for cost/performance
tradeoffs. The FLIX technology provides the flexibility
to freely and modelessly intermix instructions
of various lengths (16-, 24-, or 32-/64-bit). By
packing multiple operations into a wide 32- or
64-bit instruction word, FLIX technology allows
designers to accelerate a broader class of “hot
spots” in embedded applications. FLIX eliminates
the performance and code-size drawbacks that can
occur when using a one-size-fits-all instruction
length. Compared to rigid, high-performance
processor designs that either encode only one RISC
operation per instruction or use ultra-wide 64b/128b/256b
VLIW (very long instruction word) formats, FLIX
delivers high-performance concurrent execution
exactly and only when needed, yet preserves the
industry leading code density advantages of the
Xtensa processor’s native 16b/24b base architecture
instruction formats.
Vectra LX DSP Engine
See our Embedded Processor Forum 2004 presentation
on Vectra LX titled, “A
Second-Generation High-performance DSP Engine.”
The Vectra LX DSP engine can be added to the base
Xtensa LX2 processor core with just a click of a
configuration button in the Xtensa LX2 processor
generator. The Vectra LX engine takes advantage
of the FLIX
architecture and uses 64-bit instruction
words containing three issue slots for ALU, multiply-accumulate,
and load/sore operations. Design teams interested
in modifying the Vectra LX DSP engine for specific
configurations should contact Tensilica. The Vectra
LX engine is fully supported by the entire Tensilica
software environment including advanced auto-vectorization
capabilities in the Xtensa C/C++ Compiler (XCC).
XCC enables Vectra LX engine users to reap the
benefits of vector processing on a SIMD engine
without manual assembly-level coding.
256pt FFT (Radix-4) Performance |
| Simple RISC Engine |
Minimal configuration Xtensa LX2 using software multiply |
155,389 cycles |
| Scalar Performance |
Base Xtensa LX2 processor with MUL32 option |
23,633 cycles |
| FLIX Performance |
Xtensa LX2 with Vectra LX option |
994 cycles |
Vectra LX DSP engine really accelerates FFT performance
* The BDTIsimMark2000™ provides
a summary measure of DSP speed. For more information
and scores see www.BDTI.com.
Scores © 2004 BDTI.
|