Xtensa LX4 Archtitecture

A Compact Instruction Set Optimized for Low-Power Embedded Applications

The Xtensa LX 32-bit architecture features a compact instruction set optimized for embedded designs. The base architecture has a 32-bit ALU, up to 64 general-purpose physical registers, six special purpose registers, and 80 base instructions, including improved 16- and 24-bit (rather than 32-bit) RISC instruction encoding. Key features include:

  • A wide range of configurable options to ensure you get just the logic you need to meet your functional and performance requirements
  • Modelessly intermix standard 16- and 24- as well as custom 32- 64- or 128-bit VLIW instructions for lowest code and performance overhead.
  • Selectable 5-or-7-stage pipeline to match memory
  • Virtually unlimited I/O bandwidth with optional Queue (FIFO), Port (GPIO) and Lookup interfaces for data transfers that don't require system bus bandwidth
  • One or two 32/64/128/256/512-bit wide Load/Store units
  • Local memories configurable up to 8MB with optional parity or ECC
  • Automated fine-grained clock gating throughout processor for ultra-low power solutions

Base Instruction Set Compatibility

Configurability of a Tensilica processor core never compromises the underlying base Xtensa instruction set architecture (ISA), thereby ensuring availability of a robust ecosystem of third party application software and development tools. All configurable, extensible Xtensa processors are always compatible with major operating systems, debug probes, and ICE solutions. For each processor, the automatically generated complete software development toolchain includes an advanced Integrated Development Environment (IDE) based on the ECLIPSE framework, a world-class C/C++ compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the full industry standard GNU Toolchain.

Tensilica uses an ISA that has been backwards compatible since its introduction in 1999. It uses a base instruction set of 80 instructions and was fundamentally architected for extensibility. Designers can run application code written back in 1999 and it will run on the Xtensa LX4 processor today. Any differentiating designer-defined instructions from earlier designs can be re-used today.

Smaller Code Size

The Xtensa LX DPU can modelessly intermix 24-bit and 16-bit instructions, leading to 25-50% better code density and, therefore, smaller memories than mixed 32- and 16-bit architectures. Since memories typically dominate SOC area, this code density advantage translates into significant SOC area savings.

Powerful Base ISA

The Xtensa ISA includes powerful compare-and-branch instructions and zero-overhead loops, which allow the compiler to generate tight, optimized loops. It also provides bit manipulations including funnel shifts and field-extract operations that are critical for applications such as networking that process the fields in packet headers and perform rule-based checks.

Extensible ISA

One of the fundamental technology innovations in the Xtensa processor is the ability to easily and seamlessly add new instructions into the processor's datapath. The associated C data types, software tool chain support and the EDA scripts required to synthesize the processor are all generated automatically, just as if they had been there from the start. The specification of this new datapath and associated instructions and C data types is written in the Tensilica Instruction Extension (TIE) language.

For more information on the Xtensa ISA, download the PDF of the Xtensa ISA databook.

Xtensa LX4 architecture

The Xtensa LX4 Architecture

Innovative Architecture

There are several important innovations in the Xtensa LX4 architecture.

Optional 7-stage Pipeline

To address the growing speed disparity between standard cell logic and memories (memory access speeds have not scaled as well as logic in the migration to lower process technologies), the Xtensa LX processor features a configurable pipeline. Designers can select a configuration option for a 7-stage pipeline that adds an additional clock cycle for instruction and data memory accesses if required by the application. While the Xtensa LX processor’s standard 5-stage pipeline is very efficient for many applications, designers employing large local memories or specialized low-power memories with longer access times will find advantages in moving to a longer pipeline, resulting in a higher system clock frequency.

Processor Extensions - Accelerating Processor Performance

The Tensilica Instruction Extension (TIE) language is used to describe new instructions, new registers and execution units, and new I/O ports that are then automatically added to the Xtensa LX processor. TIE is a Verilog-like language used to describe desired instruction mnemonics, operands, encoding and execution semantics. TIE files are inputs to the Xtensa Processor Generator. The Generator automatically builds a version of the Xtensa LX processor and the complete tool chain that incorporates the new TIE instructions.

FLIX for Parallel Execution (VLIW)

Many of the major pre-configured functional blocks take advantage of Tensilica's FLIX capabilities.

The FLIX architecture makes the Xtensa LX into a VLIW processor that  executes 2 to 30 parallel execution units when needed. Wide 32/64/128-bit FLIX instruction formats are seamlessly intermixed with the base Xtensa 16/24-bit instructions so there is no mode switch penalty when using FLIX.

FLIX Formats

With FLIX, the Xtensa LX processor can deliver the ultra-high performance characteristics of a specialty ultra-wide instruction word processor without the negative code size implications typically found in such VLIW or ULIW processors. In fact, Xtensa LX processors with FLIX can often deliver higher performance and smaller code size at the same time. This performance comes with very little overhead-adding only 2,000 gates to the size of the processor for instruction decode and control.

The Xtensa C/C++ Compiler (XCC) automatically extracts parallelism from source code and bundles multiple operations into FLIX instructions. In this way, a 3-issue Xtensa LX processor running at 300 MHz can deliver performance up to the equivalent of a 900 MHz processor. Additionally, the compiler can bundle the branch and load/store instructions in parallel with compute instructions to gain a performance boost over straight-line code.

Designers can go beyond the capabilities in the major pre-configured functional blocks and use the FLIX capabilities in their own design by using Tensilica's TIE language to specify exactly what's needed.

See our Hot Chips Conference paper, “Long Words and Wide Ports: Reinventing the Configurable Processor."

Designer Defined I/O Bypasses the System Bus for Maximum Data Throughput

The Xtensa LX processor brings another fundamental breakthrough in embedded processor designs-the ability to define direct data interfaces into and out of the processor for maximum data throughput. This ability is a key reason that Tensilica's Xtensa LX is ideal for the SOC dataplane.

Tensilica provides three direct interface capabilities that are described in more detail on our I/O page.

TIE Ports: These function like General Purpose IO (GPIO) wires by providing direct connection to other logic within an SOC or to other Xtensa processors. These are created with simple one-line declarations in a TIE file.

TIE Queues: These function like FIFO interfaces. TIE input Queues function with a familiar pop/empty/data interface to external logic while TIE output Queues present a similar push/full/data interface. All interactions with the Xtensa LX processor pipeline are automatically implemented by the Xtensa Processor Generator.

TIE Lookups enable designers to connect RAMs or external devices to Xtensa LX processors. These external memories or devices can be accessed directly from the processor's datapath without using load/store instructions. These interfaces are useful for connecting table lookup RAMs, for example in networking applications, or for connecting long latency hardware computation units.

Almost 1000 logical connections (Ports+Queues+Lookups), each consisting of up to 1024 pins, can be added to each Xtensa LX processor. Tensilica's Ports and Queues can be utilized every clock cycle without the use of load/store instructions, providing virtually unlimited I/O bandwidth.

Ideal for Applications Where Low Power is Critical

Tensilica has implemented several automatic features, such as the insertion of fine-grained clock gating, to provide you with a low power processor for mobile applications. In addition, Tensilica's Xenergy energy estimation tool can be used to optimize both the Xtensa LX configuration and TIE instructions, plus it can be used to tune the software application for energy. See our low power section.

Advantages over Traditional Fixed Processor Cores

  • Base instruction set compatibility: Configurability of a Tensilica processor core never compromises the underlying base Xtensa instruction set, thereby ensuring availability of a robust ecosystem of third party application software and development tools. All configurable, extensible Xtensa processors are compatible with major operating systems, debug probes and ICE solutions. They always come with an automatically generated, complete software development toolchain including an advanced integrated development environment based on the ECLIPSE framework, a world-class compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the full industry-standard GNU toolchain.
  • Smaller code size: The Xtensa LX processor can modelessly issue 24-bit and 16-bit instructions, leading to 25-50% better code density, and therefore smaller memories, than mixed 32- and 16-bit architectures. Since memories typically dominate SOC area, this code density advantage translates into significant SOC area savings.
  • Powerful base ISA: The Xtensa ISA also provides (a) powerful compare-and-branch instructions and zero-overhead loops, which enable the compiler to generate tight, optimized loops, and (b) bit manipulations including funnel shifts and field-extract operations that are typical in applications such as networking that process the fields in packet headers and perform rule-based checks.
  • Extendable ISA: One of the fundamental technology innovations in the Xtensa processor is the ability to easily and seamlessly add new instructions and the associated C data types, along with the software tool chain support and the hardware data path to the processor. The scope of the instructions that can be added is general enough to enable the base RISC Xtensa LX processor to become an 8-way SIMD (single instruction, multiple data) general purpose DSP engine, a 3-instruction issue high-performance processor, or a small, low power cache-less controller. For example, a designer can add multi-cycle execution units, registers, register files, general purpose IO pins (Ports), and FIFO interfaces (Queues). The specification of this new data path and associated instructions and C data types is done in the Tensilica Instruction Extension (TIE) language, which is explained in more detail here .
  • Multi-issue VLIW technology: The Xtensa LX processor core features Tensilica’s powerful FLIX technology, which allows the designer to configure the processor as a multi-issue VLIW processor. The Xtensa C/C++ Compiler (XCC) automatically extracts parallelism from C/C++ code and bundles multiple operations into FLIX (VLIW) instructions. In this way, a 3-issue Xtensa processor configuration running at 300Mhz can deliver performance up to the equivalent of a 900Mhz processor. Additionally, the compiler can bundle the branch and load/store instructions in parallel with compute instructions into VLIW instructions to gain a performance boost over straight-line code. This feature can be used selectively when needed, and the FLIX instructions are modelessly intermixed with the standard 16- and 24-bit instructions to avoid code bloat.
  • Configurable local and system interfaces: The designer has flexibility to select the number and width of the local and system interfaces on the Xtensa LX processor. An Xtensa LX processor can have up to two local instruction and data RAMs and ROMs, instruction and data caches, and a single-cycle access general-purpose interface called XLMI. The widths of these local interfaces can be set to 32 bits, 64 bits, or 128 bits, independent of the PIF system interface that can also be set to any of these widths. This allows the designer to design a flexible system and memory architecture around the Xtensa LX processor.
  • Flexible designer-defined I/O interfaces: The designer can specify new interfaces to the data path in the processor that can be used to interface with other RTL and processor blocks in the SOC. These interfaces – Ports and Queues – are instantiations of general purpose I/O pins and FIFO interfaces on the processor that can be accessed directly by operations/instructions without using load/store instructions.
  • Automatically generated, pre-verified processor RTL and software tool chain: The designer-defined extensions and configuration options selected by the designer are taken as input by the Xtensa Processor Generator to automatically generate pre-verified RTL for the processor implementation, along with the entire software tool chain including compilers, debuggers, and simulators (cycle-accurate and fast functional). The designer can thus focus on application development instead of focusing on how to create an application-specific processor or how to create a complete software tool chain to support modifications they make to the processor.
Marketing Agency