Microprocessor Reports' December 2009 review of Xtensa LX3 and Xtensa 8
The What, Why and How of Configurable Processors
How to Increase ASICs and SOC Computational Performance with Long-Word Processors
Processor Ports and Queues: Easily Overcome I/O Bandwidth Obstacles in Your Next ASIC or SOC Design
Processor Configuration with Chris Rowen
The TIE (Tensilica Instruction Extension) language is similar to Verilog. It makes optimizing your Xtensa design easy and efficient. There are so many things you can do with TIE - here we're just trying to give you a few examples and ideas.
Consider the following C function byteswap, which performs a 32-bit endian conversion:
byteswap(unsigned swap_in)
This function requires many cycles to compute in software; however, it can be computed in a single cycle with a TIE operation. A useful technique for accelerating "hot spots" with TIE is to combine multiple operators into a fusion operation. A fusion operation combines a set of simple, connected operations to a single complex operation. This combination enables the Xtensa processor to perform more computations per operation, creates opportunities to share input operands, and eliminates the need to store and fetch intermediate operands.
All the operations used to perform the byteswap function are fused and optimized in the following TIE operation.
operation byteswap ; }
These few lines are all that are necessary to extend the Xtensa processor with a custom operation that performs an endian conversion on a 32-bit word. The TIE compiler automatically extends all the software tools to enable development using the new operation.
The original C code is now replaces by an intrinsic that performs the same function, as indicated in the following sample code:
#include unsigned int myFunc()
The TIE (Tensilica Instruction Extension) language offers a wide range of flexibility in adding multi-cycle, pipelined execution units, register files, state registers, SIMD arithmetic and logic units, creating wide (up to 512-bit) load-store instructions, and adding designer-defined I/O Ports, Queues , and Lookup Ports.

Full Extensibility with TIE From a RISC to a VLIW, Vector Machine
You can create your own TIE instructions to customize your Xtensa processor, or Tensilica has a number of automated tools that will help you create TIE. Tensilica offers a Flexible Length Instruction eXtension (FLIX) generator for VLIW accelerations (see below) and a Manual Fusion Editor (see below) to help designers create chains or fusions of fundamental computation operation to improve performance.
Using TIE instructions with a Tensilica processor core never compromises the underlying base Xtensa instruction set, thereby ensuring availability of a robust ecosystem of third party application software and development tools. All configurable, extensible Xtensa processors are always compatible with major operating systems, debug probes and ICE solutions; and always come with an automatically generated, complete software development toolchain including an advanced integrated development environment based on the ECLIPSE framework, a world-class compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the full industry-standard GNU toolchain.
An Xtensa LX4 processor can become a multi-issue VLIW processor. This coupled with the Xtensa C/C++ compiler’s ability to aggressively extract instruction-level parallelism from C/C++ code and bundle and schedule multiple operations in a VLIW instruction lead to an order of magnitude improvement in performance.

The beauty of Tensilica's implementation is that VLIW is only used where it's needed, eliminating the code bloat found in standard VLIW processors. Designers can manually figure out the best VLIW instructions or use Tensilica's automated FLIX generator, which profiles a designer's target C code and suggests VLIW instruction specifications that can significantly accelerate the most critical code. By allowing two or three instructions to execute simultaneously, FLIX allows and Xtensa LX processor to act as a 2- or 3-issue VLIW CPU, accelerating general purpose code by 40-60 percent.
After the processor core has been created using these new VLIW instructions, software developers programming the Xtensa LX core need only use the standard Xtensa C/C++ Compiler (XCC), which automatically extracts the instruction-level parallelism from C/C++ code and bundles operations into VLIW instructions whenever possible. So the programmer doesn't have to modify the application code to take advantage of the VLIW instruction extensions to speed up the code.
It is possible to create multi-cycle execution units using TIE that are pipelined up to 31 stages. The designer only has to specify the functionality of the units in the high-level TIE language and Tensilica tools automatically generate the decode, pipeline, control, and bypass logic and update the software tool chain (including compiler, debugger, ISS) to recognize the new instructions and registers associated with the execution unit.
The TIE specification can be done in one of two ways: manually or using Tensilica's Manual Fusion Editor, a graphical tool that helps the designer quickly identify the fusions in an application, as shown below.

Tensilica's Manual Fusion Editor