Tech Support | Generator Login | Careers | Contact Us
PRODUCTS

  Overview

  Technology

  Diamond Standard

  Xtensa

    Configurable

    Config & Extensible

    Xtensa 7

    Xtensa LX2

  + Architecture

    – Features

    – Create TIE

  + I O Bandwidth

  + Low Power

  + Floating Point

  + Real-time Trace

  + Error Detection

  + Benchmarks

    – EEMBC Networking

  + Configuration Options

  + DSP Options

  + White Papers

  DSPs

    HiFi 2 Audio

    Video

    Communications

  HW/SW Dev Tools

  Literature & Doc

Low Power

See white paper: Processor Core Power Specs: A Cautionary Tale
See white paper: Xenergy Energy Estimator White Paper
See white paper: Optimizing Energy in Processor-Memory Subsystems during SOC Design
See presentation: How to Reduce Power and Energy Consumption through ISA Extension

Getting the Lowest Possible Power From a Processor

How can a processor provide equivalent power savings to an RTL block? Don’t processors, by definition, use more power? Not Tensilica’s Xtensa LX2 processors.

Low power is of utmost concern to almost every Tensilica customer. Therefore, Tensilica offers the Xenergy tool to help determine low power architectures. Tensilica also designed the Xtensa proceessors to be low power from the start.

The Xenergy Tool Guides Low-Power Design

The Xenergy estimator is a unique energy estimator that can be used early in the SOC design cycle to optimize both processor and local memory requirements by up to half by making intelligent design trade-offs.

The Xenergy estimator can be used to optimize processor and local memory energy requirements early in the design cycle. The Xenergy energy estimator works by computing a power-consumption estimation per-cycle for each different instruction of a processor. For each designer-defined instruction extension in an Xtensa processor, created using Tensilica's powerful TIE (Tensilica Instruction Extension) language, Xenergy creates an energy estimate for the newly created instruction, including modeling the energy consumed by all locally attached memories that are active for a given instruction. Then, using the instruction profile created by Tensilica's ISS, a detailed energy consumption profile is created to match the exact configuration.

Sample screen shot setting trade-offs to be evaluated by the Xenergy tool (larger version)

The Xenergy tool is used during the process of configuring an Xtensa processor. Designers can immediately see the effect on total energy consumption when they add configuration options (multipliers, DSP engines, a floating point unit, and many additional configuration choices) and designer-defined instructions. They can see the effect of different interface options as well as memory subsystem options.

Graph of Xenergy comparison chart (larger version)

A Focus on Total Energy Consumption

A focus on total energy consumption is key. Too often, designers will focus on a static milliwatts per megahertz (mw/MHz) power figure, but ignore the total energy consumption of the workload. For example, a designer may add a set of custom instructions to a processor that increase the total size of a processor core, which increases the average power per clock cycle (increasing the mW/MHz). But if that custom instruction set addition dramatically lowers the total clock cycles required to perform a given functional workload (a target C code application) then the total energy consumed (power-per-cycle multiplied by total cycle time) can be reduced. Example: an increase in power per clock of 20% is offset by a 3x speed up in instruction execution. The mW/Mhz power consumption increases 20%, but total energy consumption is actually reduced by 60%. The reduction in required execution cycles allows the system either to spend much more time in a low-power sleep state, or to reduce frequency and voltage, leading to a sharp reduction in both dynamic and leakage power.

The inclusion of memory power consumption is another key aspect to the new Xenergy tool. Imagine a scenario where designer-defined processor extensions are used to create custom state registers and register files within an Xtensa processor core, not to appreciably improve execution performance, but instead aim at significantly decreasing accesses to local memory, thus decreasing overall energy. The Xenergy program points out this energy decrease, making it easy for the designer to weigh area, performance and power trade-offs early in the processor configuration process.

Impact on Software Design

The Xenergy energy estimator is also useful for optimizing software, even on completed chips where the processor – whether it is an Xtensa configurable processor or a Diamond Standard core – cannot be changed. Traditionally, software developers tune their code for performance or code size using Tensilica’s standard profiling tools. Now they can use the Xenergy tool to fine tune their C code to reduce energy dissipation by the processor and its memories. For example, a developer might use the feedback provided by the Xenergy tool to decide to restructure the allocation of data structures in local and main memories to reduce memory and bus accesses, which will lower overall energy expenditures.

Processors Designed from the Start to be Energy Efficient

The base Xtensa instruction set architecture, common to both the Xtensa 7 and Xtensa LX2 processor cores, provides the industry’s lowest power and highest performance when compared to legacy fixed architecture cores. Because both cores are fully configurable and designers can add application-specific instructions to the base processor using Tensilica’s patented, automated processor generator, it’s important to compare equivalent processor configurations when comparing to competing processor core offerings.

For example, a high-performance version of the Xtensa LX2 processor uses less than half the die area and power of the equivalent ARM 1136J-S: NOTE: This is not the base Xtensa LX processor. Rather, this version of Xtensa LX2 has been configured to be a high performance, general-purpose CPU.

Processor Equivalent Frequency (0.13u G worst case) Power - mW per MHz (0.13u G) Dhrystone MIPS/mW
ARM 1136J-S 333 MHz (single issue) 0.60 1.98
Xtensa LX2 3-way FLIX performance configuration 600 MHz (three-issue) 0.170 10.4

Power Reduction Up to 30 Percent

Several enhancements were made to the Xtensa LX2 processor to reduce power up to 30 percent in total core plus memory power, including:

  • Enhanced configuration choices that allow independent width selection of main system memory interface, local data memory interface, and instruction memory interface
  • Reduced execution speculation for data memory enables and accesses, leaving data cache and tightly coupled local data memories turned off for longer periods of time
  • An optional wider instruction fetch buffer that reduces instruction memory cycles (and power consumed by those instruction fetch cycles) by up to 50 percent, depending on code set.

Also, Tensilica designed in additional power-down modes, including external power-down of the trace port control and on-chip debug modules, lowering overall system power.

Fine-Grained Clock Gating Throughout

Tensilica has automated the insertion of fine-grain clock gating for every functional element of the Xtensa LX2 processor including functions conceived of and created by the designer. Clock gating is a very effective power reduction technique that shuts down the power to parts of the logic that are not in use on a particular clock cycle. Because automatic insertion of clock gating is only available for restricted RTL design coding styles, manual, error-prone post-layout tuning of clock circuits is often required for standard RTL design.

Tensilica’s Xtensa LX2 processor delivers direct I/O capability equivalent to RTL design, eliminating the wasted power of moving data into and out of conventional DSP/RISC cores using Load/Store operations.

The Xtensa LX2 processor’s architecture dramatically lowers power consumption in large configurations with many designer-defined functions. But even without designer modification, the Xtensa LX2 processor is designed to use power very efficiently. The minimum configuration of the Xtensa LX2 processor dissipates a miserly 38 micro-W/MHz in a representative 130 nm process technology. By comparison, the smallest member of the ARM synthesizable processor family, the ARM7TDMI-S, burns 110 micro-W/MHz in 130 nm technology – twice the power consumption of the Xtensa LX2.

Xtensa LX2 Configuration Description Power (130nm LV, area optimized netlist, typical) Power (90nm GT, area optimized netlist, typical)
Base RTOS Configuration (RTOS ready with caches) 76 micro-W/MHz 94 micro-W/MHz
Smaller Base (no zero-overhead loops) 47 micro-W/MHz 59 micro-W/MHz
Smallest (no caches, no PIF, no zero-overhead loops) 38 micro-W/MHz 48 micro-W/MHz

The above chart shows that Xtensa LX2’s fine grain clock gating minimizes power consumption.
CORE OF THE YEAR
Best Processor Cores of 2004
PRODUCT RESOURCES
Xtensa LX2 Product Brief
Xtensa Processor Developers Toolkit Product Brief
Microprocessor Report’s review of Xtensa LX
  Microprocessor Report's Update on Xtensa LX2 and Xtensa 7
BDTI’s Report on Tensilica Xtensa LX Processor with Vectra LX
  EEMBC Benchmarks
  BDTI Benchmarks
  Epson printer
WHITE PAPERS
FLIX: Fast Relief for Performance-Hungry Applications
XPRES Compiler
Automated Configurable Processor Design Flow
  more >

ARTICLES

Hit Performance Goals with Configurable Processors
FLIX Helps Low-Power CPU Flex its Performance
Compiler Automates RTL Generation
  EDN's 2006 Hot 100 Products
 
QUOTABLE

“Tensilica’s introduction of the Xtensa LX and its revolutionary tool, the XPRES design compiler, made it the clear winner. Even without XPRES, Xtensa LX would be the leading contender for this award, but the combination is unbeatable.”

Tom R. Halfhill,
Senior Analyst, Microprocessor Report

get more information