An Instruction Set Architecture Optimized for
Embedded Applications
The Xtensa LX2 architecture starts with base ISA features common to all Xtensa LX2 cores. SOC developers can configure the Xtensa LX2 processor from menus of predefined options and add functional units of their own design to optimize algorithm performance, achieving designs equivalent to hand-coded custom logic blocks in a fraction of the time.
For more information on the Xtensa ISA, download the PDF of the Xtensa ISA databook.

The Xtensa LX2 Architecture
(Click here for larger version)
The Xtensa LX2 processor's 32-bit architecture features a compact
instruction set optimized for embedded designs.
The base architecture has a 32-bit ALU, up to 64
general-purpose physical registers, 6 special purpose
registers and 80 base instructions, including compact
16- and 24-bit (rather than 32-bit) RISC instruction
encoding. The Xtensa LX2 processor implements the
proven Xtensa instruction set architecture (ISA),
which enables designers to achieve significant
code size reductions compared to conventional RISC
cores. Reducing code size results in higher performance
and better power dissipation - key to saving cost
in highly integrated SOC designs. The Xtensa ISA’s
16- and 24-bit encoding also supports powerful
branch instructions and zero-overhead loops, plus
bit manipulations including funnel shifts and field-extract
operations.
Optional 7-stage Pipeline
To address the growing speed disparity between standard cell logic and memories (memory access speeds have not scaled as well as logic in the migration from 180 nm to 130 nm and now 90 nm and 65nm processes), the Xtensa LX2 processor features a configurable pipeline. Designers can select a configuration option for a 7-stage pipeline that adds two additional clock cycles for memory access if required by the application. While the Xtensa LX2 processor’s standard 5-stage pipeline is very efficient for many applications, designers employing large local memories or specialized low-power memories with longer access times will find advantages in moving to a longer pipeline, resulting in a higher system clock frequency.
Processor Extensions - Accelerating Processor
Performance
The Tensilica Instruction Extension (TIE) language is used to describe new instructions, new registers
and execution units, and new I/O ports that are
then automatically added to the Xtensa LX2 processor.
TIE is a Verilog-like language used to describe
desired instruction mnemonics, operands, encoding
and execution semantics. Designers can use the
XPRES Compiler to automatically generate TIE files
and modify the generated TIE files for further
optimizations. TIE files are inputs to the Xtensa
Processor Generator. The Generator automatically
builds a version of the Xtensa LX2 processor and
the complete tool chain that incorporates the new
TIE instructions.

FLIX Architecture - Highly Parallel Implementations
The Xtensa LX2 processor implements Tensilica’s
FLIX (Flexible Length Instruction Xtension) architecture.
FLIX is a configuration option that allows designer-defined
instructions to consist of multiple, independent
operations bundled into a 32-bit or 64-bit instruction
word. Wide 32-or-64-bit FLIX instruction formats
are seamlessly and modelessly intermixed with the
base Xtensa ISA’s existing 16-/24-bit instructions
- there is no mode switch penalty to utilize a
FLIX instruction.
The FLIX architecture allows the implementation
of highly parallel processors with a performance
characteristic of specialty ultra-wide instruction
word processors, without the negative code size
implications typically found in such VLIW or ULIW
solutions.
In fact, Xtensa LX2 processors with FLIX can often
deliver higher performance and smaller code size
at the same time. This performance increase comes
with very little overhead - adding only 2,000 gates
to the size of the processor for instruction decode
and control.
See
our Hot
Chips Conference paper, “Long
Words and Wide Ports:
Reinventing the Configurable Processor”
Unlimited I/O Bandwidth
The Xtensa LX2 processor broke new ground for embedded processor design with unique communication structures that allow designers to avoid using the bus altogether. These new structures are:
- Ports (GPIO) - provide direction connections to other logic within the SOC or to another Xtensa LX2 processor.
- Queues (FIFO interfaces) - provide virtually unlimited I/O bandwidth to other logic or another Xtensa LX2 processor.
- Lookup interfaces - enable designers to directly connect RAMS or an external device to the Xtensa LX2 processor, no load/stores required.
Ideal for Applications Where Low Power is Critical
Tensilica has implemented several automatic features, such as the insertion of fine-grained clock gating, to provide you with a low power processor for mobile applications. In addition, Tensilica's Xenergy energy estimation tool can be used to optimize both the Xtensa LX2 configuration and TIE instructions, plus it can be used to tune the software application for energy. See our low power section.
Advantages over Traditional Fixed Processor Cores
- Base instruction set compatibility: Configurability of a Tensilica processor core never compromises the underlying base Xtensa instruction set, thereby ensuring availability of a robust ecosystem of third party application software and development tools. All configurable, extensible Xtensa processors are always compatible with major operating systems, debug probes and ICE solutions, and always come with an automatically generated, complete software development toolchain including an advanced integrated development environment based on the ECLIPSE framework, a world-class compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the full industry-standard GNU toolchain.
- Smaller code size: The Xtensa LX2 can modelessly issue 24-bit and 16-bit instructions, leading to 25-50% better code density and, therefore, smaller memories than mixed 32- and 16-bit architectures. Since memories typically dominate SOC area, this code density advantage translates into significant SOC area savings.
- Powerful base ISA: The Xtensa ISA also provides (a) powerful compare-and-branch instructions and zero-overhead loops, which are very useful for the compiler to generate tight, optimized loops, and (b) bit manipulations including funnel shifts and field-extract operations that are very useful for applications such as networking that process the fields in packet headers and perform rule-based checks.
- Extendable ISA: One of the fundamental technology innovations in the Xtensa processor is the ability to easily and seamlessly add new instructions and the associated C data types, along with the software tool chain support and the hardware data path to the processor. The scope of the instructions that can be added is general enough to enable the base RISC Xtensa LX2 processor to become an 8-way SIMD (single instruction, multiple data) general purpose DSP engine, a 3-instruction issue high-performance processor, or a small, low power cache-less controller. For example, a designer can add multi-cycle execution units, registers, register files, general purpose IO pins (Ports), and FIFO interfaces (Queues). The specification of this new data path and associated instructions and C data types is done in the Tensilica Instruction Extension (TIE) language, which is explained in more detail in a later section.
- Multi-issue VLIW technology: The Xtensa LX2 processor core features Tensilica’s powerful FLIX technology, which allows the designer to configure the processor as a multi-issue VLIW processor. The Xtensa C/C++ Compiler (XCC) automatically extracts parallelism from C/C++ code and bundles multiple operations into FLIX (VLIW) instructions. In this way, a 3-issue Xtensa processor configuration running at 300Mhz can deliver performance up to the equivalent of a 900Mhz processor. Additionally, the compiler can bundle the branch and load/store instructions in parallel with compute instructions into VLIW instructions to gain a performance boost over straight-line code. This feature can be used selectively when needed, and the FLIX instructions are modelessly intermixed with the standard 16- and 24-bit instructions to avoid code bloat.
- Configurable local and system interfaces: The designer has flexibility to select the number and width of the local and system interfaces on the Xtensa LX2 processor. An Xtensa LX2 processor can have up to two local instruction and data RAMs and ROMs, instruction and data caches, and a single-cycle access general-purpose interface called XLMI. The widths of these local interfaces can be set to 32 bits, 64 bits, or 128 bits, independent of the PIF system interface that can also be set to any of these widths. This allows the designer to design a flexible system and memory architecture around the Xtensa LX2 processor.
- Flexible designer-defined I/O interfaces: The designer can specify new interfaces to the data path in the processor that can be used to interface with other RTL and processor blocks in the SOC. These interfaces – Ports and Queues – are instantiations of general purpose I/O pins and FIFO interfaces on the processor that can be accessed directly by operations/instructions without using load/store instructions.
- Automatically generated, pre-verified processor RTL and software tool chain: The designer-defined extensions and configuration options selected by the designer are taken as input by the Xtensa Processor Generator to automatically generate pre-verified RTL for the processor implementation, along with the entire software tool chain including compilers, debuggers, and simulators (cycle-accurate and fast functional). The designer can thus focus on application development instead of focusing on how to create an application-specific processor or how to create a complete software tool chain to support modifications they make to the processor.
|