The What, Why and How of Configurable Processors
How to Avoid the Traps and Pitfalls of SOC Design
A Processor & DSP Selection Checklist
Get your ASICs and SOCs off the Bus!
Processor Configuration with Chris Rowen
Forget everything you know about processor interface and interconnect. Tensilica’s Xtensa LX processor breaks just about all the rules and allows high-performance, RTL-like communication between processors with its unique ports and queues capabilities (under ports and queues, link to products, Xtensa LX, I/O bandwidth). No longer must all communications go through a slow, shared 32-bit bus. Now direct wires and queues can be set up between processors and between processors and existing RTL blocks to significantly speed communication.
But before we go into more detail on this, let’s step back and look at the basic requirements for processor interface and interconnect.
Different applications make different demands on communication between the processor blocks. Four questions capture the most essential system-communications performance issues:

The figure above highlights the three basic forms of interface natural to processors tuned for SOC applications:
Direct processor-to-processor connections reduce cost and latency. Here’s a simple example of a direct connection, possible with Xtensa LX processors:

The power of these direct connections, available only from Tensilica, can’t be underestimated. These connections allow the Xtensa LX processor to communicate as fast and as flexibly as RTL blocks.
Ports are wires that directly connect two Xtensa LX processors or an Xtensa LX processor to external RTL. Port connections can be arbitrarily wide, allowing wide data types to be transferred easily without the need for multiple load/store operations. As many as one million signals (1024 ports, each 1024bits wide) can be instantiated. While this may seem to be an outrageous number, far exceeding the performance demands of real systems today (providing 350 terabits/sec of direct data flow per processor in a 130 nm CMOS process), this huge capacity for I/O bandwidth clearly demonstrates that old notions of the I/O bottlenecks inherent in processor-based design solutions are now obsolete.
While ports are ideal for quickly conveying control and status information, queues provide a high-speed mechanism to transfer streaming data. Input queues and output queues operate, from the programmer’s viewpoint, like traditional processor registers – with the notable exception that data is always available; There’s no need to load or store the data before and after computation, which saves valuable cycles in critical loops. Queues can sustain data rates as high as one transfer every clock cycle for each queue added to an Xtensa LX processor. Custom instructions can perform multiple queue operations per cycle, perhaps combining inputs from two input queues with local data and sending the computed values to two output queues. The high bandwidth and low control overhead of queues allows the Xtensa LX processor to be used in applications with extreme data rates.
Ports and queues specified by the designer are automatically added to the Xtensa LX processor and are 100% fully modeled by Tensilica’s Xtensa Processor Generator. The full behavior of the port or queue, just like any other modification made to the Xtensa LX processor, is automatically reflected in the custom software development tools, instruction set simulator, bus functional model and EDA scripts – within about an hour. And because it’s automated using Tensilica’s patented technology, it’s pre-verified and correct by construction – no need to re-verify the processor.
For more information, get a copy of the book “Engineering the Complex SOC: Fast, Flexible Design with Configurable Processors, by Chris Rowen, published by Prentice Hall.