Choosing the Right Communications Structure
Once the rough number and types of processors
is known and tasks are tentatively assigned to
the processors, basic communication structure
design starts. The goal is to discover the least
expensive communications structure that satisfies
the bandwidth and latency requirements of the
tasks, including changes in the task load that
may occur as the SOC’s use evolves over
time or across a variety of target systems.
When low cost and good flexibility are most
important, a shared-bus architecture, in which
all resources are connected to one bus, may be
most appropriate. Buses have two significant
advantages: they tend to have low hardware complexity
and bus-design issues are familiar to most designers.
The glaring liability of the shared bus is long
and unpredictable latency, particularly when
a number of bus masters contend for access to
different shared resources.
When the biggest challenge is total communications
throughput with flexibility, the preferred structure
is a general-purpose parallel communications
network. A crossbar connection is the most common
example, as is a two-level hierarchy of buses.
A simple example of a mesh topology, with nine
processors, is shown below.

General-Purpose Parallel Communications Style: On-Chip Mesh Network
When the communication pattern is well known
at design time and likely to be stable, the architect
can optimize the communications around that particular
pattern of data flow. The drawing below shows
the direct connections made when the communications
between the processors is well understood and
will not change.

Optimized Direct Parallel Communications
Communications = Software Mode + Hardware Interconnect
Intertask communications are built on two foundations:
the software communications mode and the corresponding
hardware mechanism. The three basic styles of
software communications between tasks include
message passing, shared memory, and device drivers.
Message passing makes all communication between
tasks overt. All data is private to a task except
when operands are sent by one task and received
by another task. The send/receive model implies
a queue; messages cannot be sent if the output
queue is full and cannot be received if the input
queue is empty. Hardware queues give the lowest
latency and processor overhead, especially for
small, fixed-length messages such as simple operands.
Message passing is generally easier to code than
shared-memory communications techniques when
the tasks are largely independent, but is often
harder to code efficiently when the tasks are
very tightly coupled.
With shared-memory communications, only one
task reads from or writes to the data buffer
in memory at a time. Successful use of shared
memory requires explicit access synchronization.
A destination task must know when the sourcing
task has written valid data or else old data
may be read. Embedded software languages, such
as C, typically include features that ease shared-memory
programming.
The hardware-device-plus-software-device-driver
model is most commonly used with complex I/O
interfaces, such as networks or storage devices.
The device-driver mode combines elements of message
passing and shared-memory access. The principles
of the device driver can be applied to almost
any pair of communicating tasks, especially where
the interface between tasks looks like a series
of requests and responses.
For more information, get a copy of the book “Engineering
the Complex SOC: Fast, Flexible Design with
Configurable Processors,” by Chris
Rowen, published by Prentice Hall.
|