Low Power Aspects of Configurable Processors
Power dissipation and energy consumption are critical
factors in the design of any nanometer SOC. Configurable
processors like Tensilica’s Xtensa processor
cores provide dramatic benefits in power and energy
efficiency relative to fixed-ISA processor cores.
In fact, configured Xtensa processors can offer
significant power and energy advantages over synthesized
RTL hardware in a range of situations.
For battery-powered applications, energy consumption
(instantaneous power consumption multiplied by
the time required to execute a task) is extremely
important because it governs battery life, which
translates into talk time for mobile phones, playing
time for MP3 players, viewing time for portable
media players, etc. Talk time, playing time, and
viewing time readily translate into profit margin
because buyers value products that run longer on
a battery charge. Even products that do not run
on batteries require low-power SOC designs because
SOCs that consume a lot of power increase overall
product cost through the need for more expensive
heat sinks, fans, a larger power supply, and possibly
a larger case to accommodate all of the heat-management
hardware.
The Xtensa processor family’s power and
energy advantages come from at least eight contributing
sources:
- The tailored instruction set of a configured
processor core means that the processor more
closely fits the target application than would
a processor with a fixed, general-purpose ISA.
Closer fit to the target task(s) results in fewer
execution cycles required to execute the task(s).
The need for fewer execution cycles means that
the processor either runs for less time at the
same clock frequency or can run at a lower clock
frequency and still execute a task in the required
time. Running for less time or at a lower clock
frequency reduces dynamic power and energy consumption.
- When running at a lower clock frequency, the
processor can usually be run at a lower core
voltage as well, further reducing both dynamic
and static power dissipation and energy consumption.
- A processor that operates at lower voltage
and lower clock frequency can be implemented
with a standard-cell library specifically designed
for low-voltage operation, such as Virtual Silicon’s
MobilizeTM and Artisan’s METROTM libraries.
Processors implemented with such libraries are
typically smaller, which reduces capacitance
and therefore further reduces dynamic power dissipation.
- Use of low-voltage standard-cell libraries
such as the Mobilize and METRO libraries leads
to yet another source of power and energy reduction:
dynamic voltage and frequency scaling. When the
immediate task being executed by the processor
doesn’t require full-speed operation, the
processor’s clock frequency can often be
dynamically reduced, which permits a corresponding
reduction in processor core operating voltage
and therefore incrementally reduces both static
and dynamic power dissipation and energy consumption
when less than maximum performance is required.
- Processor configuration removes processor features
(and the corresponding logic) not needed by the
target application. Eliminating unneeded hardware
from a configurable processor reduces both its
static and dynamic power and energy consumption
by reducing the number of active circuits in
the processor and by reducing the corresponding
capacitance for those removed circuits. This
benefit accrues regardless of the cell library
used to implement the processor.
- Tensilica’s Xtensa processor has extensive
internal clock gating. Because microprocessors
have very predictable execution profiles—determined
by the instructions presently executing in the
processor’s pipeline—it’s possible
to determine which portions of the processor
need clocking for every instruction at every
stage in the pipeline.
Exhaustive simulation of many, many Xtensa processor configurations has allowed
Tensilica to add extensive internal clock gating to its processors, which
results in the lowest possible switching activity inside the processor for
each instruction executed, which reduces dynamic power dissipation and energy
consumption. Configured Xtensa processors can have literally hundreds of
different gated clocks running various portions of the processor—far
more clock gating than RTL designers will typically place in manually generated,
synthesizable RTL hardware blocks.
- At a system level, implementing tasks using
configurable processors permits the use of the
processor’s software-initiated sleep mode
to shut down the processor when it’s not
needed for task execution instead of allowing
the processor hardware to continue to execute
idle cycles when no work is required. The sleep
mode places the processor in a very low-power
state. The processor’s operating voltage
can also be reduced while it’s in sleep
mode, which further reduces static power dissipation
and energy consumption. Often, synthesized synchronous
RTL blocks do not have sleep modes as these modes
require manual effort to design and verify. Therefore,
processors with sleep modes can often consume
less energy than RTL hardware blocks that implement
tasks run only intermittently.
- Processors employed in SOCs often run multiple
tasks, either sequentially or through time slicing.
This exploitation of a microprocessor’s
ability to multitask can greatly reduce the amount
of hardware needed to implement multiple tasks
in the SOC. (One example of sequential multitasking
is the implementation of multiple audio codecs.
A system may be required to execute many such
codecs, but generally only one at a time.)
RTL blocks are not typically designed to execute multiple tasks so multiple
hardware blocks must be used to implement these multiple tasks. Consequently,
processor multitasking reduces both dynamic and static power dissipation
and energy consumption relative to the multiple RTL hardware blocks that
would otherwise be needed to implement multiple tasks.
In addition, the Xtensa LX feature set provides additional ways to reduce
SOC power dissipation and energy consumption:
- The Xtensa LX processor’s designer-defined
TIE ports and queues allow the processor to execute
both very high-speed and very low-speed I/O cycles
without using its buses. TIE ports are very simple
structures compared to the bus operating hardware
and TIE queues can also be simpler, depending
on processor configuration. Consequently, performing
I/O using TIE ports definitely causes less switching
activity within the processor and TIE queue activity
may similarly incur less switching activity.
Less switching activity directly translates into lower power dissipation
and energy consumption for I/O transactions. Further, because the Xtensa
LX processor’s designer-defined TIE ports and queues are generally
implemented as point-to-point I/O structures, the associated interconnect
between the processor and the target I/O device has much less capacitance
than the more traditional bused structures normally associated with processor
interconnect. Consequently, TIE port and queue operation can consume much
less power and energy than I/O cycles conducted over processor buses and
therefore delivers substantial power and energy savings over the I/O power
requirements of fixed-ISA processors, which lack the ability to communicate
over these alternative, specialized I/O structures.
- Because TIE ports and queues can be very wide
(as wide as 1024 bits), the Xtensa LX processor
can perform high-bandwidth I/O transfers at lower
clock rates than can processors limited to performing
I/O on 32- or 64-bit buses. Lowering the required
clock rate for I/O transfers reduces switching
activity, which reduces dynamic power dissipation
and energy consumption.
|