Microprocessor Report's review of Xtensa LX3
Processor Ports and Queues: Easily Overcome I/O Bandwidth Obstacles in Your Next ASIC or SOC Design
How to Increase ASICs and SOC Computational Performance with Long-Wod Processors
Minimize Energy Consumption While Maximizing ASIC and SOC Performance
Why High MHz Does Not Mean High Performance
See entire white paper library
Double-Precision Floating Point Emulation Acceleration
Fast OFDM on Xtensa Processors
Implementing FIFO Operations Using TIE Queues
The Xtensa LX3 32-bit architecture features a compact instruction set optimized for embedded designs. The base architecture has a 32-bit ALU, up to 64 general-purpose physical registers, six special purpose registers, and 80 base instructions, including improved 16- and 24-bit (rather than 32-bit) RISC instruction encoding. Key features include:
Configurability of a Tensilica processor core never compromises the underlying base Xtensa instruction set architecture (ISA), thereby ensuring availability of a robust ecosystem of third party application software and development tools. All configurable, extensible Xtensa processors are always compatible with major operating systems, debug probes, and ICE solutions. For each processor, the automatically generated complete software development toolchain includes an advanced Integrated Development Environment (IDE) based on the ECLIPSE framework, a world-class C/C++ compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the full industry standard GNU Toolchain.
Tensilica uses an ISA that has been backwards compatible since its introduction in 1999. It uses a base instruction set of 80 instructions and was fundamentally architected for extensibility. Designers can run application code written back in 1999 and it will run on the Xtensa LX3 processor today. Any differentiating designer-defined instructions from earlier designs can be re-used today.
The Xtensa LX3 DPU can modelessly intermix 24-bit and 16-bit instructions, leading to 25-50% better code density and, therefore, smaller memories than mixed 32- and 16-bit architectures. Since memories typically dominate SOC area, this code density advantage translates into significant SOC area savings.
The Xtensa ISA includes powerful compare-and-branch instructions and zero-overhead loops, which allow the compiler to generate tight, optimized loops. It also provides bit manipulations including funnel shifts and field-extract operations that are critical for applications such as networking that process the fields in packet headers and perform rule-based checks.
One of the fundamental technology innovations in the Xtensa processor is the ability to easily and seamlessly add new instructions into the processor's datapath. The associated C data types, software tool chain support and the EDA scripts required to synthesize the processor are all generated automatically, just as if they had been there from the start. The specification of this new datapath and associated instructions and C data types is written in the Tensilica Instruction Extension (TIE) language, which is explained in more detail in a later section.
For more information on the Xtensa ISA, download the PDF of the Xtensa ISA databook.
The Xtensa LX3 Architecture
(click here for larger version)
There are several important innovations in the Xtensa LX3 architecture.
To address the growing speed disparity between standard cell logic and memories (memory access speeds have not scaled as well as logic in the migration from 90 nm to 65 nm and now 45 nm and 40 nm processes), the Xtensa LX3 processor features a configurable pipeline. Designers can select a configuration option for a 7-stage pipeline that adds an additional clock cycle for instruction and data memory accesses if required by the application. While the Xtensa LX3 processor’s standard 5-stage pipeline is very efficient for many applications, designers employing large local memories or specialized low-power memories with longer access times will find advantages in moving to a longer pipeline, resulting in a higher system clock frequency.
The Tensilica Instruction Extension (TIE) language is used to describe new instructions, new registers and execution units, and new I/O ports that are then automatically added to the Xtensa LX3 processor. TIE is a Verilog-like language used to describe desired instruction mnemonics, operands, encoding and execution semantics. Designers can use the XPRES Compiler to automatically generate TIE files and modify the generated TIE files for further optimizations. TIE files are inputs to the Xtensa Processor Generator. The Generator automatically builds a version of the Xtensa LX3 processor and the complete tool chain that incorporates the new TIE instructions.
Many of the major pre-configured functional blocks take advantage of Tensilica's FLIX capabilities.
The FLIX architecture makes the Xtensa LX3 into a VLIW processor that executes 2 to 15 parallel execution units when needed. Wide 32/64-bit FLIX instruction formats are seamlessly intermixed with the base Xtensa 16/24-bit instructions so there is no mode switch penalty when using FLIX.

With FLIX, the Xtensa LX3 processor can deliver the ultra-high performance characteristics of a specialty ultra-wide instruction word processor without the negative code size implications typically found in such VLIW or ULIW processors. In fact, Xtensa LX3 processors with FLIX can often deliver higher performance and smaller code size at the same time. This performance comes with very little overhead-adding only 2,000 gates to the size of the processor for instruction decode and control.
The Xtensa C/C++ Compiler (XCC) automatically extracts parallelism from source code and bundles multiple operations into FLIX instructions. In this way, a 3-issue Xtensa LX3 processor running at 300 MHz can deliver performance up to the equivalent of a 900 MHz processor. Additionally, the compiler can bundle the branch and load/store instructions in parallel with compute instructions to gain a performance boost over straight-line code.
Designers can go beyond the capabilities in the major pre-configured functional blocks and use the FLIX capabilities in their own design by using Tensilica's TIE language to specify exactly what's needed.
See our Hot Chips Conference paper, “Long Words and Wide Ports: Reinventing the Configurable Processor."
The Xtensa LX3 processor brings another fundamental breakthrough in embedded processor designs-the ability to define direct data interfaces into and out of the processor for maximum data throughput. This ability is a key reason that Tensilica's Xtensa LX3 is ideal for the SOC dataplane.
Tensilica provides three direct interface capabilities that are described in more detail on our I/O page.
TIE Ports: These function like General Purpose IO (GPIO) wires by providing direct connection to other logic within an SOC or to other Xtensa processors. These are created with simple one-line declarations in a TIE file.
TIE Queues: These function like FIFO interfaces. TIE input Queues function with a familiar pop/empty/data interface to external logic while TIE output Queues present a similar push/full/data interface. All interactions with the Xtensa LX3 processor pipeline are automatically implemented by the Xtensa Processor Generator.
TIE Lookups enable designers to connect RAMs or external devices to Xtensa LX3 processors. These external memories or devices can be accessed directly from the processor's datapath without using load/store instructions. These interfaces are useful for connecting table lookup RAMs, for example in networking applications, or for connecting long latency hardware computation units.
Almost 1000 logical connections (Ports+Queues+Lookups), each consisting of up to 1024 pins, can be added to each Xtensa LX3 processor. Tensilica's Ports and Queues can be utilized every clock cycle without the use of load/store instructions, providing virtually unlimited I/O bandwidth.
Tensilica has implemented several automatic features, such as the insertion of fine-grained clock gating, to provide you with a low power processor for mobile applications. In addition, Tensilica's Xenergy energy estimation tool can be used to optimize both the Xtensa LX3 configuration and TIE instructions, plus it can be used to tune the software application for energy. See our low power section.