Tech Support | Generator Login | Careers | Contact Us
METHODOLOGY

  Overview

  ESL Design

  C/C++ Design

  Speed RTL Design

  Multi Processor Dsgn

  + Design Process

  + Modeling

  + Partitioning

  + Task Assignments

  + Interconnect

  + Communications

  + Interfaces

  Low Power Design

  Optimized with TIE

  EDA Design Flow

  System Modeling

Interconnect

Processor Interface and Interconnect

Forget everything you know about processor interface and interconnect. Tensilica’s Xtensa LX processor breaks just about all the rules and allows high-performance, RTL-like communication between processors with its unique ports and queues capabilities (under ports and queues, link to products, Xtensa LX, I/O bandwidth). No longer must all communications go through a slow, shared 32-bit bus. Now direct wires and queues can be set up between processors and between processors and existing RTL blocks to significantly speed communication.

But before we go into more detail on this, let’s step back and look at the basic requirements for processor interface and interconnect.

Communications Demands on Processor Blocks

Different applications make different demands on communication between the processor blocks. Four questions capture the most essential system-communications performance issues:

  1. Required bandwidth – To sustain the required throughput of a function block, what sustained input data and output data bandwidths are necessary?
  2. Sensitivity to latency – What response latency (average and worst case) is required for a functional block’s requests on other memory or logic functions?
  3. Data granularity – What is the typical size of a transfer request – a large data block or a single word?
  4. Blocking or nonblocking communications – Can the computation be organized so that the function block can make a request and then proceed with other work with out waiting for the response to the request?


Basic Configurable, Extensible Processor Interfaces

The figure above highlights the three basic forms of interface natural to processors tuned for SOC applications:

  1. 1. Memory-mapped, wide interface – typically implemented as a local-memory connection, ideal where high bandwidth and low latency data access is required.
  2. 2. Memory-mapped, block-sized connection – typically implemented as a bus connection (the most popular, traditional processor interface).
  3. 3. Instruction-mapped, arbitrary-sized connection –implemented as a direct point-to-point connection. Instruction mapped connections can range from a single bit to thousands of bits. This connection allows RTL-equivalent data transfer speeds. Only Tensilica’s Xtensa LX processor allows this direct communication.

Direct processor-to-processor connections reduce cost and latency. Here’s a simple example of a direct connection, possible with Xtensa LX processors:

Direct Processor-to-Processor Port

The power of these direct connections, available only from Tensilica, can’t be underestimated. These connections allow the Xtensa LX processor to communicate as fast and as flexibly as RTL blocks.

Tensilica’s Ports and Queues

Ports are wires that directly connect two Xtensa LX processors or an Xtensa LX processor to external RTL. Port connections can be arbitrarily wide, allowing wide data types to be transferred easily without the need for multiple load/store operations. As many as one million signals (1024 ports, each 1024bits wide) can be instantiated. While this may seem to be an outrageous number, far exceeding the performance demands of real systems today (providing 350 terabits/sec of direct data flow per processor in a 130 nm CMOS process), this huge capacity for I/O bandwidth clearly demonstrates that old notions of the I/O bottlenecks inherent in processor-based design solutions are now obsolete.

While ports are ideal for quickly conveying control and status information, queues provide a high-speed mechanism to transfer streaming data. Input queues and output queues operate, from the programmer’s viewpoint, like traditional processor registers – with the notable exception that data is always available; There’s no need to load or store the data before and after computation, which saves valuable cycles in critical loops. Queues can sustain data rates as high as one transfer every clock cycle for each queue added to an Xtensa LX processor. Custom instructions can perform multiple queue operations per cycle, perhaps combining inputs from two input queues with local data and sending the computed values to two output queues. The high bandwidth and low control overhead of queues allows the Xtensa LX processor to be used in applications with extreme data rates.

Ports and queues specified by the designer are automatically added to the Xtensa LX processor and are 100% fully modeled by Tensilica’s Xtensa Processor Generator. The full behavior of the port or queue, just like any other modification made to the Xtensa LX processor, is automatically reflected in the custom software development tools, instruction set simulator, bus functional model and EDA scripts – within about an hour. And because it’s automated using Tensilica’s patented technology, it’s pre-verified and correct by construction – no need to re-verify the processor.

For more information, get a copy of the book “Engineering the Complex SOC: Fast, Flexible Design with Configurable Processors,” by Chris Rowen, published by Prentice Hall.

SOC Book
RECOGNITION
Red Herring top 100
Read The Future of Multicore Processors from Instat/ Microprocessor Report
Read "More Patents for Tensilica" from In-Stat/Microprocessor Report
Portable Design 2006 Editor's Choice Award
EDN 100  Hot Products 2006
QUOTABLE

“We selected Tensilica’s Xtensa processor for its ability to help us achieve our goal of developing innovative-multi-gigabit, lower-power mmWave communications products. By optimizing the Xtensa processor into a tailored processor core, this enables our products to attain the performance these wireless applications demand.”

Kumar Mahesh, Manager of MAC and Software Design for SiBEAM, Inc.