|
See white paper: Processor Core Power Specs: A Cautionary Tale
Ideal for Applications Where Low Power is Critical
Low power is of utmost concern to almost every Tensilica customer. Therefore, Tensilica offers the Xenergy tool to help determine low power architectures. Tensilica also designed the Xtensa proceessors to be low power from the start.
The Xenergy Tool Guides Low-Power Design
The Xenergy estimator is a unique energy estimator that can be used early in the SOC design cycle to optimize both processor and local memory requirements by up to half by making intelligent design trade-offs.

The Xenergy estimator can be used to optimize processor and local memory energy requirements early in the design cycle.
The Xenergy energy estimator works by computing a power-consumption estimation per-cycle for each different instruction of a processor. For each designer-defined instruction extension in an Xtensa processor, created using Tensilica's powerful TIE (Tensilica Instruction Extension) language, Xenergy creates an energy estimate for the newly created instruction, including modeling the energy consumed by all locally attached memories that are active for a given instruction. Then, using the instruction profile created by Tensilica's ISS, a detailed energy consumption profile is created to match the exact configuration.

Sample screen shot setting trade-offs to be evaluated by the Xenergy tool (larger version)
The Xenergy tool is used during the process of configuring an Xtensa processor. Designers can immediately see the effect on total energy consumption when they add configuration options (multipliers, DSP engines, a floating point unit, and many additional configuration choices) and designer-defined instructions. They can see the effect of different interface options as well as memory subsystem options.

Graph of Xenergy comparison chart (larger version)
A Focus on Total Energy Consumption
A focus on total energy consumption is key. Too often, designers will focus on a static milliwatts per megahertz (mw/MHz) power figure, but ignore the total energy consumption of the workload. For example, a designer may add a set of custom instructions to a processor that increase the total size of a processor core, which increases the average power per clock cycle (increasing the mW/MHz). But if that custom instruction set addition dramatically lowers the total clock cycles required to perform a given functional workload (a target C code application) then the total energy consumed (power-per-cycle multiplied by total cycle time) can be reduced. Example: an increase in power per clock of 20% is offset by a 3x speed up in instruction execution. The mW/Mhz power consumption increases 20%, but total energy consumption is actually reduced by 60%. The reduction in required execution cycles allows the system either to spend much more time in a low-power sleep state, or to reduce frequency and voltage, leading to a sharp reduction in both dynamic and leakage power.
The inclusion of memory power consumption is another key aspect to the new Xenergy tool. Imagine a scenario where designer-defined processor extensions are used to create custom state registers and register files within an Xtensa processor core, not to appreciably improve execution performance, but instead aim at significantly decreasing accesses to local memory, thus decreasing overall energy. The Xenergy program points out this energy decrease, making it easy for the designer to weigh area, performance and power trade-offs early in the processor configuration process.
Impact on Software Design
The Xenergy energy estimator is also useful for optimizing software, even on completed chips where the processor – whether it is an Xtensa configurable processor or a Diamond Standard core – cannot be changed. Traditionally, software developers tune their code for performance or code size using Tensilica’s standard profiling tools. Now they can use the Xenergy tool to fine tune their C code to reduce energy dissipation by the processor and its memories. For example, a developer might use the feedback provided by the Xenergy tool to decide to restructure the allocation of data structures in local and main memories to reduce memory and bus accesses, which will lower overall energy expenditures.
Processors Designed from the Start to be Energy Efficient
Tensilica employs many techniques to reduce power consumption, both built into the base hardware and configuration options allowing more control over system and memory interfaces. Tensilica processors typically consume less power than other licensable embedded CPUs at equivalent gate counts.
Fine Grained Clock Gating
Clock gating is a very effective power reduction technique that shuts down power to parts of the logic that are not in use on a particular clock cycle. Tensilica has automated the insertion of fine-grained clock gating for every functional element of the Xtensa processor including functions conceived of and created by the designer.
This automation gives the Xtensa processor a significant advantage over RTL design, where manual, error-prone power-layout tuning of clock circuits is often required. Course-grained clock gating also is implemented where large areas of the processor are idled when certain long latency operations are executed, such as a cache line refill.
Memories Affect Power, Too
Accessing local memories is one of the highest power consuming activities an embedded processor must perform. Tensilica has designed the Xtensa 7 processor to eliminate any unnecessary local memory interface activitation if that memory is not directly addressed by the processor; the implementation of this power-saving technique is automatically inserted by the Xtensa Processor Generator.
Other enhancements to the Xtensa 7 processor reduce power up to 30 percent in total core plus memory power, including:
- Enhanced configuration choices that allow independent width selection of main system memory interface, local data memory interface, and instruction memory interface
- Reduced execution speculation for data memory enables and accesses, leaving data cache and tightly coupled local data memories turned off for longer periods of time
- An optional wider instruction fetch buffer that reduces instruction memory cycles (and power consumed by those instruction fetch cycles) by up to 75 percent, depending on code set.
Also, Tensilica designed in additional power-down modes, including external power-down of the trace port control and on-chip debug modules, lowering overall system power.
In addition, the designer can configure the external data bus width and internal local memory data widths independently. This allows system-level power optimizations depending on whether the processor is constrained by external or internal instruction and data accessses.
The Xtensa processor's architecture dramatically lowers power consumption in large configurations with many designer-defined functions. But even without the inclusion of designer-defined functionality, the Xtensa 7 processor is designed to use power very efficiently. Power consumption for the Xtensa 7 processor is:
- 38 mW/MHz in 130 nm LV process, speed-optimized netlist. Typical operating conditions. Minimum configuration.
- 48 mW/MHz in 90nm GT process, speed-optimized netlist. Typical operating conditions. Minimum configuration.
|