Turn Your Xtensa Processor into a Powerful DSP

Xtensa LX4 DSP Options

More DSP Options than Any Other Core

See Building a Multi-Issue Vector DSP with Configurable-Processor Technology from GSPx 2004.

See BDTI's independent analysis of the Xtensa LX processor with Vectra LX.

See Microprocessor Report's article Applications Define DSP Speed.

The Xtensa LX4 processor excels at traditional CPU and DSP tasks in embedded SOCs as demonstrated by industry leading benchmark results on the BDTI BenchmarksTM by Berkeley Design Technology, Inc. (BDTI). The Xtensa LX configurable processor core  achieved the highest score recorded to date for a licensable processor core -- the Xtensa LX BDTIsimMark2000 score of 6150 at 370 MHz is 70% faster than the score for the next-fastest licensable core benchmarked by BDTI, the CEVA-X1620.*

Tensilica’s customers today are already using the Xtensa processor core for a variety of DSP tasks including audio processing, image processing, video processing, and communications channel processing. Additionally, Tensilica offers specialized pre-configured, optimized DSPs for audiocommunications and video processing.

The Xtensa LX processor can be used in such a wide variety of applications because Tensilica offers several means of accelerating DSP computations.

MAC16 Configuration Option

For moderate intensity signal processing applications, a 16-bit multiply-accumulate engine can be added to the base Xtensa LX processor core with just a click of a configuration button in the Xtensa processor generator. Inclusion of the MAC16 option adds a full suite of multiply / accumulate instructions including auto-incrementing loads and combined multiply-accumulate-load instructions for high performance computation. These DSP instructions are also 100% compiler supported.

ConnX Vectra LX DSP Engine

See our Embedded Processor Forum 2004 presentation on Vectra LX titled, “A Second-Generation High-performance DSP Engine."

The ConnX  Vectra LX communications DSP engine can be added to the base Xtensa LX processor core with just a click of a configuration button in the Xtensa LX processor generator. The ConnX Vectra LX engine takes advantage of the FLIX architecture and uses 64-bit instruction words containing three issue slots for ALU, multiply-accumulate, and load/sore operations. Design teams interested in modifying the ConnX Vectra LX DSP engine for specific configurations should contact Tensilica. The ConnX Vectra LX engine is fully supported by the entire Tensilica software environment including advanced auto-vectorization capabilities in the Xtensa C/C++ Compiler (XCC). XCC enables the ConnX Vectra LX engine users to reap the benefits of vector processing on a SIMD engine without manual assembly-level coding.

Simple RISC Engine Minimal configuration Xtensa LX using software multiply 155,389 cycles
Scalar Performance Base Xtensa LX processor with MUL32 option 23,633 cycles
FLIX Performance Xtensa LX with Vectra LX option 994 cycles

Vectra LX DSP engine really accelerates FFT performance

The ConnX Vectra DSP engine is avaialble with one or two Load/Store units.

ConnX D2 DSP Engine

The ConnX D2 DSP engine is a click-box option for Xtensa LX4 that is ideal for 16-bit communications DSP functions. The ConnX D2 DSP engine delivers outstanding performance from 'C' code. See the ConnX D2 DSP engine pages.

ConnX Vectra VMB

The ConnX Vectra VMB (Viterbi, 8x20-bit multiply-accumulate and bit unpacking) is for baseband communications acceleration. It includes instructions targeted at FIR, IIR and filtering, bit stream unpacking, and Viterbi trellis decode operations. Available only when the ConnX Vectra DSP engine is also selected.

ConnX Baseband Engine (BBE16)

The ConnX BBE16 is built around a core vector pipeline made of 16 18bx18b MACs.  These multipliers and associated adder and multiplexer trees enable operations such as FFT butterflies, parallel complex multiple operations and signal filter structures.

ConnX Baseband Engine (BBE64)

The ConnX BBE64 was designed for the computationally intensive tasks required for LTE-Advanced communications. It is a high-performance DSP with 64 or 128 simultaneous 18x18-bit MACs/cycle supporting more than 100 billion high-precision multiply/accumulate operations per second.

Tensilica Instruction Extensions (TIE)

For applications with one or more signal processing applications that require some amount of acceleration beyond the base features and predefined options of the Xtensa LX, the designer can quickly add instructions and hardware execution units tailored to a specific algorithm.

For example: the “butterfly" operation used in Convolutional Coding / Viterbi Decoding applications is a series of combinational Add-Compare-Select (ACS) operations. If the data in question consists of 8-bit values packed in the standard 32-bit registers of Xtensa LX, a designer can easily add an ACS instruction to the Xtensa LX processor with a small incremental block of execution unit hardware to greatly speed up Viterbi decoding for communications applications.

See our for examples of using TIE for DSP functions.

Advanced TIE

For applications with well-defined, very high performance signal processing computational demands, the TIE language provides a fast means of developing extremely powerful DSP extensions. Add custom registers and register files for unique data types. Create complex multiple-operation instructions and automatically pipeline those instructions into multi-cycle instructions by specifying a command directive in the TIE language that takes only one line of text in a TIE description. Create SIMD (single instruction, multiple data) instructions to tackle algorithms with native data parallelism. Use software-pipelining techniques to create combined compute-and-load, compute-and-store instructions for high data-rate applications that enable continuous computation without the performance overhead of processor load and store cycles.

FLIX (Flexible Length Instruction Xtensions)

Tensilica improved compute performance in the Xtensa LX processor through its innovative FLIX (Flexible Length Instruction Xtensions) architecture. The FLIX architecture is a highly efficient implementation of the Xtensa instruction set architecture (ISA) that gives designers more options for cost/performance tradeoffs. The FLIX technology provides the flexibility to freely and modelessly intermix instructions of various lengths (16-, 24-, or 32-/64-bit). By packing multiple operations into a wide 32- or 64-bit instruction word, FLIX technology allows designers to accelerate a broader class of application “hot spots". FLIX eliminates the performance and code-size drawbacks that can occur when using a one-size-fits-all instruction length.  Compared to rigid, high-performance processor designs that either encode only one RISC operation per instruction or use ultra-wide 64b/128b/256b VLIW (very long instruction word) formats, FLIX delivers high-performance concurrent execution exactly and only when needed, yet preserves the industry leading code density advantages of the Xtensa processor’s native 16b/24b base architecture instruction formats.

* The BDTIsimMark2000™ provides a summary measure of DSP speed. For more information and scores see www.BDTI.com. Scores © 2004 BDTI.

Marketing Agency