The What, Why and How of Configurable Processors
How to Increase ASICs and SOC Computational Performance with Long-Word Processors
Processor Ports and Queues: Easily Overcome I/O Bandwidth Obstacles in Your Next ASIC or SOC Design
Processor Configuration with Chris Rowen
Tensilica offers two ways to accelerate floating point operations:
The Xtensa processor’s FPU adds the logic and architectural components needed for 32-bit IEEE 754 single-precision floating-point operations. These operations are common in DSP algorithms that require better than 16 bits of precision, such as high-quality audio compression and decompression, printing and graphics. Also, DSP algorithms operating on less precise data are more easily coded using floating-point (because of the wide dynamic range), and floating-point operations boost the performance of many programs written in high-level programming languages such as C.
The Xtensa processor's FPU is remarkably small for such a full-featured implementation that offers single cycle performance on most operations - the total gate count is only about 25K gates (including floating-point registers).
The floating-point unit is an option for the Xtensa processors. The floating-point unit uses separate integer and floating-point execution units; this provides high sustained throughput for floating-point intensive code. Because Xtensa processors are configurable and extensible, additional optimizations can be made by the designer. If the floating-point option is not selected, the compiler emulates the floating-point operations in software.
The configuration option for double-precision floating point acceleration adds several instructions that significantly accelerate double-precision operations. Customers who need low energy, moderate performance double-precision floating point operations should consider using this package. This package adds an estimated 4K gates when synthesizing for low area to a standard Xtensa processor and less than 7K gates when synthesizing for high speed.
In addition to speeding up double-precision functionality, the instruction extensions used for speeding up the floating point divide operation can also be used to speed up integer divide and modulus operations for configurations without the divide option.