Tech Support | Generator Login | Careers | Contact Us
PRODUCTS

  Overview

  Technology

  Diamond Standard

  Xtensa

    Configurable

    Config & Extensible

    Xtensa 7

    Xtensa LX2

  DSPs

    HiFi 2 Audio

    Video

    Communications

  HW/SW Dev Tools

  + For Processor
     Designers

    – Xplorer IDE

    – TIE Compiler

    – Create TIE

    – Develop Configs

    – Analyze Configs

    – XPRES Compiler

      + Optimizations

      + Quality of Results

      + FAQ

    – ISS & TurboXim

    – System Modeling

    – Chip Tools

  + For Software
     Developers

    – Xplorer IDE

    – XCC Compiler

    – ISS & TurboXim

    – System Modeling

    – Real-time Trace

  + IDE & RTOS Support

  + HW Emulation

  Literature & Doc

Optimization Techniques for XPRES Compiler

Optimized for Maximum Acceleration

The XPRES Compiler explores a wide range of possible optimizations using four main acceleration techniques: operator fusion, SIMD / vectorized operations, parallelism of independent operations using FLIX, and specialized operations. By using these techniques, the XPRES Compiler gets high quality results and helps designers get the speed they need for their C/C++ coded algorithms.

Operator Fusion

Operator fusion is a technique that creates instructions (operations) that consist of several simpler operations. A simple example: combining a basic ADD and SHIFT operation to form an ADD_SHIFT instruction that executes in one cycle. This ADD_SHIFT instruction could replace two sequentially issued instructions, thus saving a clock cycle and saving code size. Fusion can be used to combine existing base Xtensa ISA instructions or other operations previously created using TIE. The XCC Compiler, when compiling C code into a binary executable, utilizes sophisticated graph-matching algorithms to automatically infer the best use of the fused operation to replace individual, simple operations.

The XPRES Compiler offers sophisticated visualization and control mechanisms to allow the designer to optionally explore and control the number and types of fusions created.

The Fusion Manager lets designers control the level to which operations are combined to save cycles

Vector / SIMD

Vector operations increase performance by performing the same logical operation simultaneously on more than one data element. Example: a 2-wide vector addition operation can perform two simultaneous 32-bit additions from one 64-bit register location. The XPRES Compiler automatically explores 2-, 4-, and even 8-wide implementations of SIMD operations and explores vectorized versions of both base Xtensa ISA operations as well as SIMD versions of manually generated TIE operations. When compiling C code into a binary executable, the XCC Compiler utilizes sophisticated vectorization techniques to “unroll” inner loops of performance-intensive applications to take advantage of the SIMD versions of such operations without the need to modify the C code to explicitly use the SIMD functions. The XPRES Compiler weighs both the added hardware cost of parallel execution units needed for SIMD operations and the added register-file cost of wider operands when evaluating SIMD techniques for acceleration.

FLIX and Specialized Operations

The Xtensa LX processor incorporates Tensilica’s FLIX (Flexible Length Instruction Xtensions) architecture. FLIX allows designer-defined instructions to consist of multiple, independent operations bundled into a compact 32-bit or 64-bit instruction word that coexists with the native 16-bit and 24-bit Xtensa ISA. The FLIX architecture allows the implementation of highly parallel processors with a range from 2 to 15 parallel execution units. Thus Xtensa LX processors can deliver the high performance characteristic of specialty ultra-wide instruction word processors without the code bloat typically incurred by such VLIW or ULIW processors.

The XPRES Compiler enables designers to rapidly explore the benefits of FLIX by automating the analysis of the cost-benefit tradeoffs of the parallelism provided by FLIX.

Instruction extensions for the Xtensa LX processor that exploit the FLIX architecture allow the combination of multiple independent operations scheduled and bundled at compile time by the XCC Compiler. To achieve higher performance, FLIX supports multiple independent execution pipelines and adding additional ports to Xtensa LX register files. The XPRES Compiler does a comprehensive evaluation of the performance benefit of creating FLIX implementations versus the hardware cost factor when creating optimized processor configurations.

PRODUCT RESOURCES
Xtensa Processor Developer's Toolkit Product Brief
Demo of XPRES Compiler
WHITE PAPERS
XPRES Compiler: Triple-Threat Solution to Code Performance Challenges
Automated Configurable Processor Design Flow
XPRES White Paper: Rapid SOC Development using Automatically Generated Processors
ARTICLES
Tensilica’s Automaton Arrives by Microprocessor Report
Compiler Leverages Automation Power of CPU Core
Tensilica Compiler Automates RTL Generation
QUOTABLE

“It’s not just that XPRES can automatically generate custom hardware from C/C++ code…. Rather, it’s the whole tool chain and design flow that sets Tensilica’s technology apart. Tensilica is closer than any other company to realizing a vision of software-driven automated hardware design that for decades has mesmerized engineers, academic researchers, and entrepreneurs.”

Tom R. Halfhill,
Senior Analyst, Microprocessor Report

get more info