Tech Support | Generator Login | Careers | Contact Us
METHODOLOGY

 Overview

 ESL Design

 C/C++ Design

 Speed RTL Design

 Multi Processor Dsgn

   Design Process

   Modeling

   Partioning

   Task Assignments

   Interconnect

   Communications

   Interfaces

 Low Power Design

 Optimized with TIE

 EDA Design Flow

 System Modeling

Partitioning

Partitioning the Multiple Processor Design

There are several different ways to partition a design to employ multiple processors. The partitioning is very similar to the process most architects use to partition a SOC into blocks using RTL. However, by using processors instead of manually coded RTL logic blocks, major functions can remain flexible, allowing software changes to be made after the chip is fabricated to accommodate new standards or to add features.

To partition the design, SOC architects start from a set of tasks and apply a spectrum of techniques, including these four basic actions:

  1. Allocate (mostly) independent tasks to different processors, with communications among tasks expressed via shared memory and messages, or dedicated ultra-high-performance datapaths (link datapaths to ports and queues section – part of Section 4 – task assignments (following)) for high-bandwidth, regular data flows.
  2. Speed up each individual task by optimizing the processor on which it runs by extending the processor’s instruction set and register set.
  3. For tasks that are particularly performance-critical, decompose the task into a set of parallel tasks running on a set of optimized, inter-communicating processors.
  4. Combine multiple low-bandwidth tasks on one processor by time-slicing (multitasking). This approach degrades parallelism, but may improve SOC cost and efficiency if a processor has sufficient computation cycles available.

These four methods interact with one another, so iterative refinement is often essential, particularly as the design evolves. Quick exploration of tradeoffs through trial system design, experimental processor configuration, and fast system simulation are especially important.

Forms of Partitioning

When a system’s functions are partitioned into multiple interacting function blocks, there are several possible organizational structures including:

    Heterogeneous tasks – distinct, loosely coupled subsystems that share modest amounts of common data or control information and can be implemented largely independently of each other. The chief system-level design concern is supplying adequate resources for all the requirements of the individual subsystems.

Simple Heterogeneous System Partitioning

The figure above shows one plausible topology for a system where networking, video and audio processing tasks are implemented in separate processors, sharing common memory, bus, and I/O resources.

  • Parallel tasks– communications infrastructure equipment, for example, often supports large numbers of wired communications ports, voice channels, or wireless frequency-band controllers. These tasks are easily divided into a number of identical subsystems, perhaps with some setup and management from a controller, as shown below.

Parallel Task System Partitioning



Even when the parallelism isn’t obvious, many system applications still lend themselves to parallel implementation. For example, an image-processing system may operate on a dependent series of frames, but the operations on one part of a frame may be largely independent of operations on another part of that same frame. Creating a two-dimensional array of sub-image processors can achieve high parallelism without substantial algorithm redesign.

  • Pipelined tasks– algorithms can often be organized naturally into phases, so that one phase of the algorithm can be performed on one block of data while a subsequent phase is performed on an earlier block. (This arrangement is called a systolic-processing array). Below is an example of a systolic array for decoding compressed video. The Huffman-decode processor pulls an encoded video stream out of memory, expands it, and passes the data through a dedicated queue to the inverse-discrete-cosine-transform (iDCT) processor, which performs a complex sequence of tasks on the image block and passes that block through a second queue to a motion-compensation processor, which combines the data with previous image data to produce the final decoded video stream.

Pipelined Task System Partitioning

  • Hybrids – the above three system-partitioning cases are unrealistically simple. Real systems usually require a mixture of these partitioning styles.

For more information, get a copy of the book “Engineering the Complex SOC: Fast, Flexible Design with Configurable Processors,” by Chris Rowen, published by Prentice Hall.

SOC Book
RECOGNITION
Read "The Future of Multicore Processors" from Instat/Microprocessor Report
Read "More Patents for Tensilica" from In-Stat/Microprocessor Report
QUOTABLE

“Intel’s CE SoC, coupled with Tensilica’s HiFi 2 audio solution, will help to deliver a great all-around sound experience that consumer electronics enthusiasts will love.”

William O. Leszinske, Jr.,
General Manager of Intel’s Consumer Electronics Group