Full Extensibility with TIE
TIE offers a wide range of flexibility in adding multi-cycle, pipelined execution units, register files, state registers, SIMD arithmetic and logic units, creating wide (up to 128-bit) load-store instructions, and adding designer-defined I/O Ports (GPIO), Queues (FIFO interfaces), and Lookup Ports.

Full Extensibility with TIEFrom a RISC to a VLIW, Vector Machine
You can create your own TIE instructions to customize your Xtensa processor, or Tensilica has a number of automated tools that will help you create tie. One of those tools is the XPRES Compiler, which uses standard C/C++ code as input and automatically determines different TIE optimizations that will run that C code much faster. Tensilica also offers a Flexible Length Instruction eXtension (FLIX) generator for VLIW accelerations and a Manual Fusion Editor to help designers create chains or fusions of fundamental computation operation to improve performance.
Using TIE instructions with a Tensilica processor core never compromises the underlying base Xtensa instruction set, thereby ensuring availability of a robust ecosystem of third party application software and development tools. All configurable, extensible Xtensa processors are always compatible with major operating systems, debug probes and ICE solutions; and always come with an automatically generated, complete software development toolchain including an advanced integrated development environment based on the ECLIPSE framework, a world-class compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the full industry-standard GNU toolchain.
Creating FLIX (VLIW) Acceleration
An Xtensa LX2 processor can become a multi-issue VLIW processor. This coupled with the Xtensa C/C++ compiler’s ability to aggressively extract instruction-level parallelism from C/C++ code and bundle and schedule multiple operations in a VLIW instruction lead to an order of magnitude improvement in performance.

The beauty of Tensilica's implementation is that VLIW is only used where it's needed, eliminating the code bloat found in standard VLIW processors. Designers can manually figure out the best VLIW instructions or use Tensilica's automated FLIX generator, which profiles a designer's target C code and suggests VLIW instruction specifications that can significantly accelerate the most critical code. By allowing two or three instructions to execute simultaneously, FLIX allows and Xtensa LX2 processor to act as a 2- or 3-issue VLIW CPU, accelerating general purpose code by 40-60 percent.
After the processor core has been created using these new VLIW instructions, software developers programming the Xtensa LX2 core need only use the standard Xtensa C/C++ Compiler (XCC), which automatically extracts the instruction-level parallelism from C/C++ code and bundles operations into VLIW instructions whenever possible. So the programmer doesn't have to modify the application code to take advantage of the VLIW instruction extensions to speed up the code.
Creating Multi-cycle Pipelined Execution Units with Fusions
It is possible to create multi-cycle execution units using TIE that are pipelined up to 31 stages. The designer only has to specify the functionality of the units in the high-level TIE language and Tensilica tools automatically generate the decode, pipeline, control, and bypass logic and update the software tool chain (including compiler, debugger, ISS) to recognize the new instructions and registers associated with the execution unit.
The TIE specification can be done in one of two ways: manually or using Tensilica's Manual Fusion Editor, a graphical tool that helps the designer quickly identify the fusions in an application, as shown below.

As an example, the figure below shows the TIE and corresponding TIE execution unit that performs a 16x16 multiply and saturates the result down to 16 bits:
operation MUL_SAT_16 {out AR z, in AR a, in AR b} {}
{
wire [31:0] m = TIEmul(a[15:0],b[15:0],1);
assign z = {16'b0,
m[31] ? ((m[31:23]==9'b1) ? m[23:8] : 16'h8000)
: ((m[31:23]==9'b0) ? m[23:8] : 16'h7fff) };
}
schedule ms {MUL_SAT_16} {def z 2;}

Creating Pipelined Instructions
Creating SIMD TIE Execution Units
Creating SIMD execution units with TIE is just as simple. Here is the same MUL-SAT from the previous example, but designed as a SIMD unit that does two multiply-saturates in a SIMD fashion.
operation MUL_SAT_16 {out AR z, in AR a, in AR b} {}
{
wire [31:0] m1 = TIEmul(a[31:16],b[31:16],1);
wire [31:0] m0 = TIEmul(a[15:0], b[15:0], 1);
assign z = {m1[31] ? ((m1[31:23]==9'b1) ? m1[23:8] : 16'h8000)
: ((m1[31:23]==9'b0) ? m1[23:8] : 16'h7fff),
{m0[30] ? ((m0[31:23]==9'b1) ? m0[23:8] : 16'h8000)
: ((m0[31:23]==9'b0) ? m0[23:8] : 16'h7fff) };
}
schedule ms {MUL_SAT_16} {def z 2;}

SIMD : Exploiting Data Parallelism
Creating Vector Register Files to Couple with SIMD Units
Designers can specify a new register file with the following simple one line TIE statement:
regfile VecReg 64 16 vr
This TIE statement instantiates a register file called “VecReg” that consists of 16 registers that are 64-bit wide and the assembler will refer to the registers in this register file as “vr0, vr1, vr2, …, vr15”.
Designer-defined vector register files are particularly useful when coupled with SIMD execution units as shown in the example below. In this example, we created a 4-way SIMD multiple-saturate execution unit (and corresponding instruction) that uses the 64-bit vector register file for source and destination operands.
regfile VR 64 16 vr
operation MUL_SAT_4x16 {out VR z, in VR a, in VR b} {}
{
wire [31:0] m3 = TIEmul(a[63:48],b[63:48],1);
wire [31:0] m2 = TIEmul(a[47:32],b[47:32],1);
wire [31:0] m1 = TIEmul(a[31:16],b[31:16],1);
wire [31:0] m0 = TIEmul(a[15:0], b[15:0], 1);
assign z = {m3[31] ? ((m3[31:23]==9'b1) ? m3[23:8] : 16'h8000)
: ((m3[31:23]==9'b0) ? m3[23:8] : 16'h7fff),
m2[31] ? ((m2[31:23]==9'b1) ? m2[23:8] : 16'h8000)
: ((m2[31:23]==9'b0) ? m2[23:8] : 16'h7fff),
m1[31] ? ((m1[31:23]==9'b1) ? m1[23:8] : 16'h8000)
: ((m1[31:23]==9'b0) ? m1[23:8] : 16'h7fff),
m0[31] ? ((m0[31:23]==9'b1) ? m0[23:8] : 16'h8000)
: ((m0[31:23]==9'b0) ? m0[23:8] : 16'h7fff) };
}
schedule ms {MUL_SAT_4x16} {def z 2;}

Using a Vector Register File with SIMD Instructions
|