Processor Ports and Queues: Easily Overcome I/O Bandwidth Obstacles in Your Next ASIC or SOC Design
How to Increase ASICs and SOC Computational Performance with Long-Wod Processors
Minimize Energy Consumption While Maximizing ASIC and SOC Performance
Why High MHz Does Not Mean High Performance
See entire white paper library
Double-Precision Floating Point Emulation Acceleration
Fast OFDM on Xtensa Processors
Implementing FIFO Operations Using TIE Queues
See Building a Multi-Issue Vector DSP with Configurable-Processor Technology from GSPx 2004.
See BDTI's independent analysis of the Xtensa LX processor with Vectra LX.
See Microprocessor Report's article Applications Define DSP Speed.
The Xtensa LX2 processor excels at traditional CPU and DSP tasks in embedded SOCs as demonstrated by industry leading benchmark results on the BDTI BenchmarksTM by Berkeley Design Technology, Inc. (BDTI). The Xtensa LX configurable processor core achieved the highest score recorded to date for a licensable processor core -- the Xtensa LX BDTIsimMark2000 score of 6150 at 370 MHz is 70% faster than the score for the next-fastest licensable core benchmarked by BDTI, the CEVA-X1620.*
Tensilica’s customers today are already using the Xtensa processor core for a variety of DSP tasks including audio processing, image processing, video processing, and communications channel processing. Additionally, Tensilica offers specialized pre-configured, optimized DSPs for audio and video processing.
The Xtensa LX2 processor can be used in such a wide variety of applications because Tensilica offers four separate means of accelerating DSP computations.
|
||||||||||||
For moderate intensity signal processing applications, a 16-bit multiply-accumulate engine can be added to the base Xtensa LX2 processor core with just a click of a configuration button in the Xtensa processor generator. Inclusion of the MAC16 option adds a full suite of multiply / accumulate instructions including auto-incrementing loads and combined multiply-accumulate-load instructions for high performance computation. These DSP instructions are also 100% compiler supported.
For applications with one or more signal processing applications that require some amount of acceleration beyond the base RISC processor features of Xtensa LX2, the designer can quickly add instructions and hardware execution units tailored to a specific algorithm.
For example: the “butterfly" operation used in Convolutional Coding / Viterbi Decoding applications is a series of combination Add-Compare-Select (ACS) operations. If the data in question consists of 8-bit values packed in the standard 32-bit registers of Xtensa LX2, a designer can easily add an ACS instruction the Xtensa LX2 processor with a small incremental block of execution unit hardware to greatly speed up Viterbi decoding for communications applications.
For applications with well-defined, very high performance signal processing computational demands, the TIE language provides a fast means of developing extremely powerful DSP extensions. Add custom registers and register files for unique data types. Create complex multiple-operation instructions and automatically pipeline those instructions into multi-cycle instructions by specifying a command directive in the TIE language that takes only one line of text in a TIE description. Create SIMD (single instruction, multiple data) instructions to tackle algorithms with native data parallelism. Use software-pipelining techniques to create combined compute-and-load, compute-and-store instructions for high data-rate applications that enable continuous computation without the performance overhead of processor load and store cycles.
Tensilica improved compute performance in the Xtensa LX processor through its innovative FLIX (Flexible Length Instruction Xtensions) architecture. The FLIX architecture is a highly efficient implementation of the Xtensa instruction set architecture (ISA) that gives designers more options for cost/performance tradeoffs. The FLIX technology provides the flexibility to freely and modelessly intermix instructions of various lengths (16-, 24-, or 32-/64-bit). By packing multiple operations into a wide 32- or 64-bit instruction word, FLIX technology allows designers to accelerate a broader class of “hot spots" in embedded applications. FLIX eliminates the performance and code-size drawbacks that can occur when using a one-size-fits-all instruction length. Compared to rigid, high-performance processor designs that either encode only one RISC operation per instruction or use ultra-wide 64b/128b/256b VLIW (very long instruction word) formats, FLIX delivers high-performance concurrent execution exactly and only when needed, yet preserves the industry leading code density advantages of the Xtensa processor’s native 16b/24b base architecture instruction formats.
See our Embedded Processor Forum 2004 presentation on Vectra LX titled, “A Second-Generation High-performance DSP Engine."
The Vectra LX DSP engine can be added to the base Xtensa LX2 processor core with just a click of a configuration button in the Xtensa LX2 processor generator. The Vectra LX engine takes advantage of the FLIX architecture and uses 64-bit instruction words containing three issue slots for ALU, multiply-accumulate, and load/sore operations. Design teams interested in modifying the Vectra LX DSP engine for specific configurations should contact Tensilica. The Vectra LX engine is fully supported by the entire Tensilica software environment including advanced auto-vectorization capabilities in the Xtensa C/C++ Compiler (XCC). XCC enables Vectra LX engine users to reap the benefits of vector processing on a SIMD engine without manual assembly-level coding.
| Simple RISC Engine | Minimal configuration Xtensa LX2 using software multiply | 155,389 cycles |
| Scalar Performance | Base Xtensa LX2 processor with MUL32 option | 23,633 cycles |
| FLIX Performance | Xtensa LX2 with Vectra LX option | 994 cycles |
Vectra LX DSP engine really accelerates FFT performance
* The BDTIsimMark2000™ provides a summary measure of DSP speed. For more information and scores see www.BDTI.com. Scores © 2004 BDTI.