Optimized for Maximum Acceleration
The XPRES Compiler explores a wide range of possible
optimizations using four main acceleration techniques:
operator fusion, SIMD / vectorized operations,
parallelism of independent operations using FLIX,
and specialized operations. By using these techniques,
the XPRES Compiler gets high
quality results and
helps designers get the speed they need for their
C/C++ coded algorithms.
Operator Fusion
Operator fusion is a technique that creates instructions
(operations) that consist of several simpler operations.
A simple example: combining a basic ADD and SHIFT
operation to form an ADD_SHIFT instruction that
executes in one cycle. This ADD_SHIFT instruction
could replace two sequentially issued instructions,
thus saving a clock cycle and saving code size.
Fusion can be used to combine existing base Xtensa
ISA instructions or other operations previously
created using TIE. The XCC Compiler, when compiling
C code into a binary executable, utilizes sophisticated
graph-matching algorithms to automatically infer
the best use of the fused operation to replace
individual, simple operations.
The XPRES Compiler offers sophisticated visualization
and control mechanisms to allow the designer to
optionally explore and control the number and types
of fusions created.

The Fusion
Manager lets designers control the level to which
operations are combined to save cycles
Vector / SIMD
Vector operations increase performance by performing
the same logical operation simultaneously on more
than one data element. Example: a 2-wide vector
addition operation can perform two simultaneous
32-bit additions from one 64-bit register location.
The XPRES Compiler automatically explores 2-, 4-,
and even 8-wide implementations of SIMD operations
and explores vectorized versions of both base Xtensa
ISA operations as well as SIMD versions of manually
generated TIE operations. When compiling C code
into a binary executable, the XCC Compiler utilizes
sophisticated vectorization techniques to “unroll” inner
loops of performance-intensive applications to
take advantage of the SIMD versions of such operations
without the need to modify the C code to explicitly
use the SIMD functions. The XPRES Compiler weighs
both the added hardware cost of parallel execution
units needed for SIMD operations and the added
register-file cost of wider operands when evaluating
SIMD techniques for acceleration.
FLIX and Specialized Operations
The Xtensa LX processor incorporates Tensilica’s FLIX (Flexible Length Instruction Xtensions) architecture. FLIX allows designer-defined instructions to consist of multiple, independent operations bundled into a compact 32-bit or 64-bit instruction word that coexists with the native 16-bit and 24-bit Xtensa ISA. The FLIX architecture allows the implementation of highly parallel processors with a range from 2 to 15 parallel execution units. Thus Xtensa LX processors can deliver the high performance characteristic of specialty ultra-wide instruction word processors without the code bloat typically incurred by such VLIW or ULIW processors.
The XPRES Compiler enables designers to rapidly explore the benefits of FLIX by automating the analysis of the cost-benefit tradeoffs of the parallelism provided by FLIX.
Instruction extensions for the Xtensa LX processor that exploit the FLIX architecture allow the combination of multiple independent operations scheduled and bundled at compile time by the XCC Compiler. To achieve higher performance, FLIX supports multiple independent execution pipelines and adding additional ports to Xtensa LX register files. The XPRES Compiler does a comprehensive evaluation of the performance benefit of creating FLIX implementations versus the hardware cost factor when creating optimized processor configurations.
|