The What, Why and How of Configurable Processors
How to Avoid the Traps and Pitfalls of SOC Design
A Processor & DSP Selection Checklist
Get your ASICs and SOCs off the Bus!
Processor Configuration with Chris Rowen
Xtensa processors by definition are processors, so you can implement your control functions as software running on Xtensa processors rather than hard-wired in RTL state machines.
In traditional RTL-based design methodologies, system designers determine the computational elements - or datapath - needed to deliver the desired performance given the chip’s design constraints (gate count, power dissipation, etc.). They also design a state machine to sequence the data through the custom datapath elements in a manner that executes all the necessary permutations of the desired algorithms.
It’s the state machine where most of the RTL verification issues occur.

In RTL Design, the state machine creates the highest risk
Designing a new functional block using the Xtensa processor and extensions follows much the same process. The Xtensa processor’s various bus interfaces scale from 32 to 128 bits to provide conventional I/O bandwidth exceeding 40Gbit/s in 0.13 micron IC-fabrication processes and up to one million pins of designer-defined I/O using ports and queues. Designer-defined execution units of very high complexity – including multiple independent operations, SIMD parallelism, and multi-cycle execution - can be implemented in much the same fashion as a designer would define the datapath of an RTL block. Advanced software pipelining techniques can also be used to combine data load/store operations with computational operations to achieve continuous computation without the added latency of serialized load-compute-store sequences.
Once the Xtensa processor datapath additions have been designed, implementation of the target algorithm occurs by writing firmware to sequence the data through the tailored processor. While RTL datapath hardware and Tensilica execution units can deliver similar levels of parallelism and performance, the true advantage of the Tensilica-based methodology is in the design and debug of the control algorithm, or state machine.
RTL design methods implement the algorithm using hardwired gates. A design method based on the Xtensa processor implements the algorithm as firmware. The approached based on the Xtensa processor offers many advantages. Software running on the Xtensa instruction set simulator (ISS) simulate at speeds up to a million cycles per second on a simple PC host.
Gates in a hard-coded state machine built in Verilog or VHDL must be verified at the RTL or netlist level using simulation environments that run at only tens of cycles per second. In addition, the difficult-to-design state-machine logic required when using an RTL-based design approach is automatically generated by the Xtensa Processor Generator. The Xtensa processor’s control logic is pre-verified and correct by construction. Consequently, verification of systems based on Xtensa processors requires much less time.
Bugs found in the firmware running on an Xtensa processor can be repaired in minutes with a simple software change. Bugs found in hardwired gates after chip tapeout require million-dollar mask-set changes and months of delay to fabricate new silicon.
Xtensa processors lets you eliminate months of HDL design simulation and prevent multi-million-dollar re-spins to fix bugs. With Xtensa processors, post-silicon “bug fix" cycles require only a 10-minute re-compilation of firmware code.
Three examples, ranging from simple to complex, illustrate how datapath extensions allow extensible processors to replace RTL hardware in a variety of situations. Read on to find out more about these examples.