See white paper: Processor Core Power Specs: A Cautionary Tale
Low power is of utmost concern to almost every Tensilica customer. Therefore, Tensilica offers the Xenergy tool to help determine low power architectures. Tensilica also designed the Xtensa proceessors to be low power from the start.
The Xenergy estimator is a unique energy estimator that can be used early in the SOC design cycle to optimize both processor and local memory requirements by up to half by making intelligent design trade-offs.

The Xenergy estimator can be used to optimize processor and local memory energy requirements early in the design cycle.
The Xenergy energy estimator works by computing a power-consumption estimation per-cycle for each different instruction of a processor. For each designer-defined instruction extension in an Xtensa processor, created using Tensilica's powerful TIE (Tensilica Instruction Extension) language, Xenergy creates an energy estimate for the newly created instruction, including modeling the energy consumed by all locally attached memories that are active for a given instruction. Then, using the instruction profile created by Tensilica's ISS, a detailed energy consumption profile is created to match the exact configuration.

Sample screen shot setting trade-offs to be evaluated by the Xenergy tool (larger version)
The Xenergy tool is used during the process of configuring an Xtensa processor. Designers can immediately see the effect on total energy consumption when they add configuration options (multipliers, DSP engines, a floating point unit, and many additional configuration choices) and designer-defined instructions. They can see the effect of different interface options as well as memory subsystem options.

Graph of Xenergy comparison chart (larger version)
A focus on total energy consumption is key. Too often, designers will focus on a static milliwatts per megahertz (mW/MHz) power figure, but ignore the total energy consumption of the workload. For example, a designer may add a set of custom instructions to a processor that increase the total size of a processor core, which increases the average power per clock cycle (increasing the mW/MHz). But if that custom instruction set addition dramatically lowers the total clock cycles required to perform a given functional workload (a target C code application) then the total energy consumed (power-per-cycle multiplied by total cycle time) can be reduced.
Example: an increase in power per clock of 20% is offset by a 3x speed up in instruction execution. The mW/Mhz power consumption increases 20%, but total energy consumption is actually reduced by 60%. The reduction in required execution cycles allows the system either to spend much more time in a low-power sleep state, or to reduce frequency and voltage, leading to a sharp reduction in both dynamic and leakage power.
The inclusion of memory power consumption is another key aspect to the Xenergy tool. Imagine a scenario where designer-defined processor extensions are used to create custom state registers and register files within an Xtensa processor core, not to appreciably improve execution performance, but instead aim at significantly decreasing accesses to local memory, thus decreasing overall energy. The Xenergy program points out this energy decrease, making it easy for the designer to weigh area, performance and power trade-offs early in the processor configuration process.
The Xenergy energy estimator is also useful for optimizing software, even on completed chips where the processor – whether it is an Xtensa configurable processor or a Diamond Standard core – cannot be changed. Traditionally, software developers tune their code for performance or code size using Tensilica’s standard profiling tools. Now they can use the Xenergy tool to fine tune their C code to reduce energy dissipation by the processor and its memories. For example, a developer might use the feedback provided by the Xenergy tool to decide to restructure the allocation of data structures in local and main memories to reduce memory and bus accesses, which will lower overall energy expenditures.
Power often is the key issue in an SOC design. Tensilica employs many techniques to reduce power consumption, both built in to the base hardware and into the configuration options, allowing more control over system and memory resources. Tensilica processors consistently consume less power than other licensable embedded CPUs at equivalent gate counts.
Tensilica automates the insertion of fine-grained clock gating for every functional element, including those defined by the designer. This automation gives the Xtensa DPU a significant advantage over RTL design where manual, error-prone post-layout tuning of clock circuits is often required.
Course-grained clock gating also is implemented where large areas of the processor are idled when certain long latency operations are executed, such as a cache line refill.
Accessing local memories is one of the highest power consuming activities an embedded processor must perform. Tensilica has designed Xtensa processors to eliminate any unnecessary local memory interface activitation if that memory is not directly addressed by the processor; the implementation of this power-saving technique is automatically inserted by the Xtensa Processor Generator.
Other features in Xtensa processors reduce power by up to 30 percent (in total core plus memory power) include:
Also, Tensilica designed in additional power-down modes, including external power-down of the trace port control and on-chip debug modules, lowering overall system power.
The designer can configure the external data bus width and internal local memory data widths independently. This allows system-level power optimizations depending on whether the processor is constrained by external or internal instruction and data accessses.
The Xtensa processor's architecture dramatically lowers power consumption in large configurations with many designer-defined functions. But even without the inclusion of designer-defined functionality, Xtensa processors are designed to use power very efficiently. The Xtensa processor, in its smallest configuration, is just 0.024 mm2 with 12 uW/MHz average dynamic power post place&route in 40 LP process technology at 60 Hz.