Partitioning the Multiple Processor Design
There are several different ways to partition
a design to employ multiple processors. The partitioning
is very similar to the process most architects
use to partition a SOC into blocks using RTL. However,
by using processors instead of manually coded RTL
logic blocks, major functions can remain flexible,
allowing software changes to be made after the
chip is fabricated to accommodate new standards
or to add features.
To partition the design, SOC architects start
from a set of tasks and apply a spectrum of techniques,
including these four basic actions:
- Allocate (mostly) independent tasks to different
processors, with communications among tasks expressed
via shared memory and messages, or dedicated
ultra-high-performance datapaths (link datapaths
to ports and queues section – part of Section
4 – task assignments (following)) for high-bandwidth,
regular data flows.
- Speed up each individual task by optimizing
the processor on which it runs by extending the
processor’s instruction set and register
set.
- For tasks that are particularly performance-critical,
decompose the task into a set of parallel tasks
running on a set of optimized, inter-communicating
processors.
- Combine multiple low-bandwidth tasks on one
processor by time-slicing (multitasking). This
approach degrades parallelism, but may improve
SOC cost and efficiency if a processor has sufficient
computation cycles available.
These four methods interact with one another,
so iterative refinement is often essential, particularly
as the design evolves. Quick exploration of tradeoffs
through trial system design, experimental processor
configuration, and fast system simulation are especially
important.
Forms of Partitioning
When a system’s functions are partitioned
into multiple interacting function blocks, there
are several possible organizational structures
including:
Heterogeneous tasks – distinct, loosely
coupled subsystems that share modest amounts
of common data or control information and can
be implemented largely independently of each
other. The chief system-level design concern
is supplying adequate resources for all the requirements
of the individual subsystems.
Simple Heterogeneous System Partitioning
The figure above shows one plausible topology
for a system where networking, video and audio
processing tasks are implemented in separate processors,
sharing common memory, bus, and I/O resources.
- Parallel tasks– communications
infrastructure equipment, for example, often
supports large numbers of wired communications
ports, voice channels, or wireless frequency-band
controllers. These tasks are easily divided into
a number of identical subsystems, perhaps with
some setup and management from a controller,
as shown below.
Parallel Task System Partitioning

Even
when the parallelism isn’t
obvious, many system applications
still lend themselves to
parallel implementation. For
example, an image-processing
system may operate on a dependent
series of frames, but the operations
on one part of a frame may
be largely independent of operations
on another part of that same
frame. Creating a two-dimensional
array of sub-image processors
can achieve high parallelism
without substantial algorithm
redesign.
- Pipelined tasks– algorithms
can often be organized naturally into
phases, so that one phase of the algorithm
can be performed on one block of data
while a subsequent phase is performed
on an earlier block. (This arrangement
is called a systolic-processing array).
Below is an example of a systolic array
for decoding compressed video. The Huffman-decode
processor pulls an encoded video stream
out of memory, expands it, and passes
the data through a dedicated queue to
the inverse-discrete-cosine-transform
(iDCT) processor, which performs a complex sequence of tasks on the image block
and passes that block through a second queue to a motion-compensation processor,
which combines the data with previous image data to produce the final decoded
video stream.
Pipelined Task System Partitioning

- Hybrids – the
above three system-partitioning
cases are unrealistically simple. Real
systems usually require a mixture
of these partitioning styles.
For more information, get a copy of the book “Engineering
the Complex SOC: Fast, Flexible Design with
Configurable Processors,” by Chris Rowen, published
by Prentice Hall.
|