Processor Interface and Interconnect
Forget everything you know about processor interface
and interconnect. Tensilica’s Xtensa LX processor
breaks just about all the rules and allows high-performance,
RTL-like communication between processors with
its unique ports and queues capabilities (under
ports and queues, link to products, Xtensa LX,
I/O bandwidth). No longer must all communications
go through a slow, shared 32-bit bus. Now direct
wires and queues can be set up between processors
and between processors and existing RTL blocks
to significantly speed communication.
But before we go into more detail on this, let’s
step back and look at the basic requirements for
processor interface and interconnect.
Communications Demands on Processor Blocks
Different applications make different demands
on communication between the processor blocks.
Four questions capture the most essential system-communications
performance issues:
- Required bandwidth – To
sustain the required throughput of a function
block, what sustained input data and output data
bandwidths are necessary?
- Sensitivity to latency – What
response latency (average and worst case) is
required for a functional block’s requests
on other memory or logic functions?
- Data granularity – What
is the typical size of a transfer request – a
large data block or a single word?
- Blocking or nonblocking
communications – Can the computation
be organized so that the function block can
make a request and then proceed with other
work with out waiting for the response to the
request?
Basic Configurable, Extensible Processor Interfaces

The figure above highlights the three basic forms
of interface natural to processors tuned for SOC
applications:
- 1. Memory-mapped, wide interface – typically
implemented as a local-memory connection, ideal
where high bandwidth and low latency data access
is required.
- 2. Memory-mapped, block-sized connection – typically
implemented as a bus connection (the most popular,
traditional processor interface).
- 3. Instruction-mapped, arbitrary-sized connection –implemented
as a direct point-to-point connection. Instruction
mapped connections can range from a single bit
to thousands of bits. This connection allows
RTL-equivalent data transfer speeds. Only
Tensilica’s
Xtensa LX processor allows this direct communication.
Direct processor-to-processor connections reduce
cost and latency. Here’s a simple example
of a direct connection, possible with Xtensa LX
processors:
Direct Processor-to-Processor Port

The power of these direct connections, available
only from Tensilica, can’t be underestimated.
These connections allow the Xtensa LX processor
to communicate as fast and as flexibly as RTL blocks.
Tensilica’s Ports and Queues
Ports are wires that directly connect two Xtensa
LX processors or an Xtensa LX processor to external
RTL. Port connections can be arbitrarily wide,
allowing wide data types to be transferred easily
without the need for multiple load/store operations.
As many as one million signals (1024 ports, each
1024bits wide) can be instantiated. While this
may seem to be an outrageous number, far exceeding
the performance demands of real systems today (providing
350 terabits/sec of direct data flow per processor
in a 130 nm CMOS process), this huge capacity for
I/O bandwidth clearly demonstrates that old notions
of the I/O bottlenecks inherent in processor-based
design solutions are now obsolete.
While ports are ideal for quickly conveying control
and status information, queues provide a high-speed
mechanism to transfer streaming data. Input queues
and output queues operate, from the programmer’s
viewpoint, like traditional processor registers – with
the notable exception that data is always available;
There’s no need to load or store the data
before and after computation, which saves valuable
cycles in critical loops. Queues can sustain data
rates as high as one transfer every clock cycle
for each queue added to an Xtensa LX processor.
Custom instructions can perform multiple queue
operations per cycle, perhaps combining inputs
from two input queues with local data and sending
the computed values to two output queues. The high
bandwidth and low control overhead of queues allows
the Xtensa LX processor to be used in applications
with extreme data rates.
Ports and queues specified by the designer are
automatically added to the Xtensa LX processor
and are 100% fully modeled by Tensilica’s
Xtensa Processor Generator. The full behavior of
the port or queue, just like any other modification
made to the Xtensa LX processor, is automatically
reflected in the custom software development tools,
instruction set simulator, bus functional model
and EDA scripts – within about an hour. And
because it’s automated using Tensilica’s
patented technology, it’s pre-verified and
correct by construction – no need to re-verify
the processor.
For more information, get a copy of the book “Engineering
the Complex SOC: Fast, Flexible Design with Configurable
Processors,” by Chris Rowen, published
by Prentice Hall.
|