Catching
Up with Moore’s Law:
How to Fully Exploit the Benefits of Nanometer
Silicon
In 1965, Gordon Moore prophesized that
integrated circuit density would double roughly
every one to two years. The universal acceptance
and relentless tracking of this trend set a
grueling pace for all chip developers. This
trend makes transistors ever cheaper and faster
(good) but also invites system buyers to expect
constant improvements to functionality, battery
life, throughput, and cost (not so good). The
moment a new function is technically feasible,
the race is on to deliver it. Today, it is
perfectly feasible to build SOC devices with
more than 100 million transistors, and within
a couple of years we’ll see billion-transistor
chips built for complex applications; combining
processors, memory, logic, and interface.
High integration creates a terrific opportunity.
The remarkable characteristics of CMOS silicon
scaling allow the cost, size, performance, and
power for a given function to all improve simultaneously.
This scaling allows continuous improvement in
end-product benefits: longer battery life, smaller
size, more functionality, and higher user productivity.
This scaling has been a primary driver for the
parallel revolutions in digital consumer electronics,
personal computing, and the Internet. Moreover,
most observers expect the scaling trend to continue
for at least another 15 years.
The growth in available transistors creates
a fundamental role for concurrency in SOC designs.
Different tasks such as audio and video processing
and network-protocol stack management, can operate
largely independently of one another. Complex
tasks with inherent internal execution parallelism
can be decomposed into a tightly-coupled collection
of sub-tasks operating in parallel to perform
the same work as the original non-parallel task
implementation. This kind of concurrency offers
the potential for significant improvements in
application latency, data bandwidth, and energy
efficiency when compared to serial execution
of the same collection of tasks with a single
computational resource.
If high silicon integration is a terrific opportunity,
then the design task must be recognized as correspondingly
terrifying. Three forces work together to make
chip design tougher and tougher. First, the astonishing
success of semiconductor manufacturers to track
Moore’s Law gives designers twice as many
gates to play with every two years. Second, the
continuous improvement in process geometry and
circuit characteristics motivates chip builders
to design with new IC fabrication technologies
as they come available. Third, and perhaps most
important, the end markets for electronic products—consumer,
computing, and communications systems—are
in constant churn demanding a constant stream
of new functions and performance to justify new
purchases.
As a result, the design “hill” keeps
getting steeper. Certainly, improved chip-design
tools help—faster RTL simulation, higher
capacity logic synthesis and better block placement
and routing all mitigate some of the difficulties.
Similarly, the movement towards systematic logic
design reuse can reduce the amount of new design
that must be done for each chip.
But all these improvements fail to close the
design gap. This well-recognized phenomenon is
captured in the Semiconductor Research Corporation’s
simple comparison of the growth in logic complexity
and designer productivity in Figure 1 below.

Figure 1. Design complexity and designer productivity.
Even as designers wrestle with the growing resource
demands of advanced chip design, they face two
additional worries:
- How do design teams ensure that the chip
specification really satisfies customer needs?
- How do design
teams ensure that the chip really meets those
specifications?
- Further, a good design team
will also anticipate future needs of current
customers and potential future customers—it
has a built-in road map.
Roadblock 1: Building the Wrong Chip (Inflexibility)
If
the design team fails on the first criterion
listed above, the chip may work perfectly,
but will have inadequate sales to justify the
design expense and manufacturing effort. Changes
in requirements may be driven by demands of specific
key customers or may reflect overall market
trends such as the emergence of new data-format
standards or new feature expectations across
an entire product category. While most SOC designs
include some form of embedded control processor,
the limited performance of those processors often
precludes them from being used for essential
data-processing tasks, so software usually
cannot be used to add or change fundamental
new features.
Roadblock 2: Building the Chip Wrong (Failed
Design Process)
If the design team fails on the
second criterion listed above, additional time
and resources must go towards changing or fixing
the design. This resource diversion delays market
entry and causes companies to miss key customer
commitments. The failure is most often realized
as a program delay. This delay may come in the
form of missed integration or verification milestones,
or it may come in the form of hardware bugs—explicit
logic errors that are not caught in the limited
verification coverage of typical hardware simulation.
The underlying cause might be a subtle error
in a single design element, or it might be a
miscommunication of requirements—subtle
differences in assumptions between hardware and
software teams, between design and verification
teams, or between SOC designer and SOC library
or foundry supplier. In any case, the design
team may often be forced into an urgent cycle
of re-design, re-verification, and re-fabrication
of the chip. These design “spins” rarely
take less than six months, causing significant
disruption to product and business plans.
To improve the design process, we must consider
simultaneous changes in all three interacting
dimensions of the design environment: design
elements, design tools, and design methodology.
- Design elements are the basic building blocks,
the silicon structures, and the logical elements
that form the basic vocabulary of design expression.
Historically, these blocks have been basic
logic functions (NAND and NOR gates and flip-flops),
plus algorithms written in C and assembly code
running on RISC microprocessors and digital
signal processors.
- Design tools are the application
programs and techniques that designers use
to capture, verify, refine, and translate design
descriptions for particular tasks and subsystems.
Historically, tools such as RTL compilation
and verification, code assemblers and compilers,
and standard-cell placement and routing have
comprised the essential tool box for complex
chip design.
- Design methodology is the design
team’s
strategy for combining the available elements
and tools into a systematic process for implementing
the target silicon and software. A methodology
specifies what elements and tools are available,
describes how the tools are used at each step
of the design refinement, and outlines the
sequence of design steps. The current dominant
SOC design methodology is built around four
major steps, typically implemented in the following
order: hardware-software partitioning, detailed
RTL block design and verification, chip integration
of RTL blocks, processors and memories, and
post-silicon software bring-up.
Changes in
any one dimension are unlikely to prevent the
pitfalls of SOC design— “building
the wrong chip” or “building the
chip wrong”. Piecemeal improvements in
RTL design or software development tools cannot
solve the larger design problem. Instead, it
is necessary change the design problem itself.
The design elements, key tools and the surrounding
methodology must all change together.
The Fundamental Trends of SOC Design
Several
basic trends suggest that the engineering community
needs a new approach for SOC design. The first
trend is the seemingly inexorable growth in silicon
density, which underlies the fundamental economics
of building electronic products in the 21st century.
At the center of this trend is the fact that
the semiconductor industry seems willing and
able to continue to push chip density by consistent,
sustained innovation through smaller transistor
sizes, smaller interconnect geometries, higher
transistor speed, significantly lower cost, and
lower power dissipation over a long period of
time. Technical challenges for scaling abound.
Issues of power dissipation, nanometer lithography,
signal integrity, and interconnect delay all
will require significant innovation. Past experience
suggests that these challenges, at worst, will
only marginally slow down the pace of scaling.
The central question remains this: How will
we design what Moore’s Law scaling makes
technically achievable?
This silicon scaling trend stimulates the second
trend—the drive to take this available
density and actually integrate into one piece
of silicon the enormous diversity and huge number
of functions required by modern electronic products.
The increasing integration level creates the
possibility of taking all the key functions associated
with a network switch, or a digital camera, or
a personal information appliance, and putting
these functions—all of the logic, all of
the memory, all of the interfaces, in fact almost
everything electronic in the end-product—into
one piece of silicon, or something close to it.
The benefits of high silicon integration levels
are clear. Tight integration drives the end product’s
form factor, making complex systems small enough
to put into your pocket, inside your television,
or in your car. High integration levels also
drive down power dissipation, making more end
products battery powered, fan-less, or available
for use in a much wider variety of environments.
Ever increasing integration levels drive the
raw performance—in terms of how quickly
a product will accomplish tasks or in terms of
the number of different functions that a product
can incorporate—ever upward. These attributes
are, in fact, likely become even more important
product features, ideally enough to make the
average consumer rush to their favorite retailer
to buy new products to replace the old ones.
A New SOC for Every System is a Bad Idea
The
resulting silicon specialization stemming from
higher and higher integration creates an economic
challenge for the product developer. If all
of the electronics in a system are embodied in
roughly one chip, that chip is increasingly likely
to be a direct reflection of the end product
the designer is trying to define. Such a chip
design lacks flexibility. It cannot be used in
a wide variety of products.
In the absence of some characteristic that makes
that highly integrated chip significantly more
flexible and reusable, SOC design moves towards
a direct 1:1 correspondence between chip design
and system design. Ultimately, if SOC design
were to really go down this road, the time to
develop a new system and the amount of engineering
resources required to build a new system will,
unfortunately, become at least as great as the
time and costs to build new chips.
In the past, product designers built systems
by combining chips onto large printed circuit
boards (PCBs). Different systems used different
combinations of (mostly) off-the-shelf chips
soldered onto system-specific PCBs. The approach
worked because a wide variety of silicon components
were available and because PCB design and prototyping
was easy. System reprogrammability was relatively
unimportant because system redesign was relatively
cheap and quick.
In the world of nanometer silicon technology,
the situation is dramatically different. Demands
for smaller physical system size, greater energy
efficiency and lower manufacturing cost all make
the large PCB obsolete. Volume-oriented end-product
requirements can only be satisfied with system-on-chip
designs. Even when appropriate “virtual
components” are available as SOC building
blocks, SOC design integration and prototyping
are more than two orders of magnitude more expensive
than PCB design and prototyping. Moreover, SOC
design changes take months while PCB changes
take just days. SOC design is mandatory to reap
the benefits of nanometer silicon but to make
SOC design practical, SOCs cannot be built like
PCBs. The problem of SOC inflexibility must be
addressed.
Chip-level inflexibility is really a crisis
in reusability of the chip’s hardware design.
Despite substantial industry attention to the
benefits of block-level hardware reuse (IP reuse),
the growth in internal complexity of blocks coupled
with the complex interactions among blocks has
limited the systematic and economical reuse of
IP blocks.
Too often customer requirements, implemented
standards, and the necessary interfaces to other
functions must evolve with each product variation.
These boundaries constrain successful block reuse
to two categories: simple blocks that implement
stable interface functions and inherently flexible
functions that can be implemented in processors,
whose great flexibility and adaptability are
realized via software programmability.
On the other hand, a requirement to build new
chips for every system would be an economic disaster
for system developers because there’s no
question that building chips is hard. We can
improve the situation somewhat with better chip-design
tools, but in the absence of some significant
innovation in chip-design methodology, the situation’s
not getting better very fast.
In fact, it would appear that in the absence
of some major innovation, the efforts required
to design a chip will increase more rapidly than
the transistor complexity of the chip itself.
We’re losing ground in systems design because
innovation in design methodology is lacking.
We cannot afford to lose ground on this problem,
as system and chip design grow closer together.
SOC Design Reform: Lower Design Cost and Greater
Design Flexibility
System developers are trying
to solve two closely related problems:
- To develop system designs with significantly
fewer resources by making it much, much easier
to design the chips in those systems.
- To make
SOCs more adaptable so not every new system
design requires a new SOC design.
The way to
solve these two problems is to make the SOC
sufficiently programmable so that one chip design
will efficiently serve 10, or 100, or even 1000
different system designs while giving up none
or perhaps just a few of the benefits of integration.
Solving these problems means having chips available
off the shelf to satisfy the requirements of
the next system design and amortize the costs
of chip development over a large number of system
designs.
These trends constitute the force behind the
need for a fundamental shift in IC design. That
fundamental shift will ideally provide both a
big improvement in the effort needed to design
SOCs (not just in the silicon but also the required
software) and it will increase the intrinsic
flexibility of SOC designs so that the design
effort can be shared across many system designs.
Economic success in the electronics industry
hinges on the ability to make future SOCs more
flexible and more highly optimized at the same
time. The core dilemma for the SOC industry and
for all the users of SOC devices is really simultaneous
management of flexibility and optimality.
SOC developers are trying to minimize chip design
costs and trying to get closer and closer to
the promised benefits of high-level silicon integration
at the same time. Consequently, they need to
take full advantage of what high-density silicon
offers and, at the same time, they need to overcome
or mitigate the issues created by the sheer complexity
of those SOC designs and the high costs and risks
associated with the long SOC development cycle.
Programmability allows SOC designers to substantially
mitigate the costs and risks of complex SOC designs,
by accelerating the initial development effort
and by easing the effort to accommodate subsequent
revision of system requirements.
Programmability
Rising system complexity also
makes programmability essential to SOCs, and
the more efficient programming becomes, the
more pervasive it will be. The market already
offers a wide range of possible ways to achieve
system programmability including field-programmable
gate arrays (FPGAs), standard microprocessors,
and reconfigurable logic.
Programmability’s benefits come at two
levels. First, programmability increases the
likelihood that a pre-existing design can meet
the performance, efficiency, and functional requirements
of the system. If there is a fit, no new SOC
development is required—an existing platform
will serve. Second, programmability means that
even when a new SOC must be designed, more of
the total functions are implemented in a programmable
fashion, reducing the design risk and effort.
The successes of both the FPGA and processor
markets are traceable to these factors.
The programming models for different platforms
differ widely. Traditional processors (including
DSPs) can execute applications of unbounded complexity,
though as complexity grows, performance typically
suffers. Processors typically use sophisticated
pipelining and circuit-design techniques to achieve
high clock frequency, but achieve only modest
parallelism—one (or a few) operations per
clock cycle. FPGAs, by contrast, have finite
capacity—once the problem grows beyond
some level of complexity, the problem will not
fit in an FPGA at all. On the other hand, FPGAs
can implement algorithms with very high levels
of intrinsic parallelism, sometimes performing
the equivalent of hundreds of operations per
cycle. FPGAs typically operate at more modest
clock rates than processors and tend to have
larger die sizes and higher chip costs than processors
used in the same applications.
Programmability versus Efficiency
All these flavors
of programmability allow the underlying silicon
design to be somewhat generic while permitting
configuration or personalization for a specific
situation at the time that the system is booted
or during system operation. The traditional problem
with programmability is that there’s a
tremendous gap in efficiency and/or performance
between a hard-wired design and a design with
the same function implemented with programmable
technology. This gap could be called the “programmability
overhead”.
This overhead may be defined as the increased
area for implementation of a function using programmable
methods, compared to a hardwired implementation
with the same performance. Alternatively, the
overhead may be defined as the increase in execution
time of a programmable design solution, compared
to a hardwired implementation of the same silicon
area. As a rule of thumb, the overhead for FPGA
or generic processor programmability is more
than a factor of ten, and can reach a factor
of one hundred. For example, hardwired logic
solutions are typically about 100x faster for
security applications such as DES and AES encryption
than the same tasks implemented with a general-purpose
RISC processor. An FPGA implementation of these
encryption functions may run only at 3-4x lower
clock frequency than hardwired logic, but may
require 10-20x more silicon area.
These inefficiencies stem from the excessive
generality of the universal digital substrates:
FPGAs and general-purpose processors. The designers
of these general-purpose substrates are working
to construct platforms to cover all possible
scenarios. Unfortunately, the creation of truly
general-purpose substrates requires a superabundance
of basic facilities from which to fabricate specific
computational functions and connection paths
to move data among computation functions. Silicon
efficiency is constrained by the limited reuse
or “time-multiplexing” of the transistors
implementing an application’s essential
functions.
In fact, if you look at either an FPGA or a
general-purpose processor performing an “add” computation,
you will find a group of logic gates comprising
an adder surrounded by a vast number of multiplexers
and wires to deliver the right data to the right
adder at the right moment. The circuit overhead
associated with storing and moving the data and
selecting the correct sequence of functions leads
to much higher circuit delays and a much larger
number of required transistors and wires than
a design where the sequence of operations to
be performed is known. General-purpose processors
rely on time-multiplexing of a small and basic
set of function units for basic arithmetic and
logical operations and memory references. Most
of the processor logic serves as hardware to
route different operands to the small set of
shared hardwire function units. Communication
among functions is implicit in the reuse of processor
registers and memory locations by different operations.
FPGA logic, by contrast, minimizes the implicit
sharing of hardware among different functions.
Instead, each function is statically mapped to
particular region of the FPGA silicon, so each
transistor typically performs a single function
repeatedly. Communication among functions is
explicit in the static configuration of interconnect
among functions.
The more that is known about the required computation,
the more the transistors involved in the computation
can be interconnected with dedicated wires to
enable high utilization of computational units.
Both general-purpose processors and general-purpose
FPGA technologies have overhead, but an exploration
of software programmability highlights the hidden
overhead of field hardware programmability.
Modern software programmability’s power
really stems from two complementary characteristics.
One of these is abstraction. Software programs
allow developers to deal with computation in
a form that is more concise, more readily understood
at a glance, and more easily enhanced independent
of implementation details. Abstraction yields
insight into overall solution structure by hiding
the implementation details. Modest-sized software
teams routinely develop, reuse, and enhance applications
with hundreds of thousands of lines of source
code, including extensive reuse of operating
systems, application libraries, and middleware
software components. In addition, sophisticated
application-analysis tools have evolved to help
teams debug and maintain these complex applications.
By comparison, similar-sized hardware teams
consider logic functions with tens of thousands
of lines of Verilog or VHDL code to be quite
large and complex. Blocks are modified only with
the greatest care. Coding abstraction is limited—simple
registers, memories and primitive arithmetic
functions may constitute the most complex generic
reusable hardware functions in a block.
The second characteristic is software’s
ease of modification. System functionality changes
when you first boot the system and it changes
dynamically when you switch tasks. In a software-driven
environment, if a task requires a complete change
to a subsystem’s functionality, the system
can load a new software-based personality for
that subsystem from memory in a few microseconds
in response to changing system demands.
This is a key point: the economic benefits of
software flexibility appear both in the development
cycle (what happens between product conception
and system “power-on”) and during
the expected operational life of a design (what
happens between freezing the product specification
and the moment when the last variant of the product
is shipped to the last customer).
Continuous adaptability to new product requirements
plays a central role in improving product profitability.
If the system can be reprogrammed quickly and
cheaply, the developers reduce the risk of failing
to meet design specifications and have greater
opportunity to quickly adapt the product to new
customer needs. Field-upgrading software has
become routine with PC systems and the ability
to upgrade software in the field is starting
to find its way into embedded products. For example,
products such as Cisco network switches get regular
software upgrades. The greater the flexibility
(at a given cost) the greater the number of customers
and the higher the SOC volume.
In contrast, hardwired design choices must be
made very early in the system design cycle. If,
at any point in the development cycle—during
design, during prototyping, during field trials,
during upgrades in the field, or during second-generation
product development—the system developers
decide to change key computation or communication
decisions, then it’s back to square one
for the system design. Hard-wiring key design
decisions also narrows the potential range of
customers and systems into which the SOC might
fit and limits the potential volume shipments.
Making systems more programmable has benefits
and liabilities. The benefits are seen in development
agility and efficiency. Designers don’t
need to decide exactly how the computational
elements relate to each other until later in
the design cycle, when the cost of change is
low. Whether programmability comes through loading
a net list into an FPGA or software into processors,
the designer doesn’t have to decide on
a final configuration until system power up.
If programmability is realized through software
running on processors, developers may defer many
design decisions until system bring-up. Some
decisions can be deferred until the eve of product
shipment. The liabilities of programmability
have historically been cost, power efficiency,
and performance. Conventional processor and FPGA
programmability carries an overhead of thousands
of transistors and thousands of microns of wire
between the computational functions. This overhead
translates into large die size and low clock
frequency for FPGAs and long execution time for
processors, compared to hardwired logic implementing
the same function. Fixing the implementation
hardware in silicon can dramatically improve
unit cost and system performance but dramatically
raises design risk and design cost. The design
team faces trying choices. Which functions should
be implemented in hardware? Which in software?
Which functions are most likely to change? How
will communication among blocks evolve?
This trade-off between efficiency and performance
on one hand, and programmability and flexibility,
on the other hand, is a recurring theme is this
book. Figure 2 gives a conceptual view of this
tradeoff.

Figure 2. The essential tradeoff of design.
The vertical axis indicates the intrinsic complexity
of a block or of the whole system. The horizontal
axis indicates the performance or efficiency
requirement of a block or the whole system. One
recurring dilemma of SOC design is this: solutions
that have the flexibility to support complex
designs sacrifice performance, efficiency, and
throughput; solutions with high efficiency or
throughput often sacrifice the flexibility necessary
to handle complex applications.
The curves in Figure 2 represent overall design
styles or methodologies. Within a methodology,
a variety of solutions are possible for each
block or for the whole system, but the design
team must trade off complexity and high efficiency.
An improved design style or methodology may still
exhibit tradeoffs in solutions, but with an overall
improvement in the “flexibility-efficiency
product.” In moving to an improved design
methodology, the team may focus on improving
programmability (to get better flexibility at
a given level of performance) or it may focus
on improving performance (to get better efficiency
and throughput at a given level of programmability).
In a sense, the key to efficient SOC design
is managing uncertainty. If the design team can
optimize all dimensions of the product for which
requirements are stable and leave flexible all
dimension of the product that are unstable, they
will have a cheaper, more durable, and more efficient
product than their competitors.
The Key to SOC Design Success: Domain-Specific
Flexibility
The goal of SOC development is to strike an
optimal balance between getting just enough flexibility
in the SOC to meet changing demands within a
problem, and yet to still realize the efficiency
and optimality associated with targeting an end
application. Therefore, what’s needed is
an SOC design methodology that permits a high
degree of system and subsystem parallelism, an
appropriate degree of programmability, and rapid
design.
It is not necessary for SOC-based system developers
to use a completely universal piece of silicon—most
SOCs ship in enough volume to justify specialization.
For example, a designer of digital cameras doesn’t
need to use the same chip that’s used in
a high-end optical network switch. One camera
chip, however, can support a range of related
consumer imaging products, as shown in Figure
3.

Figure 3. One camera SOC into many camera systems
The difference in benefit derived from a chip
shared by ten similar designs, versus one shared
by 1,000 designs, is relatively modest. If each
camera design’s volume is 200,000 units,
and the shared SOC design costs $10M, then the
SOC design contributes $5 to final camera cost
(~5%). Sharing the SOC design across 1000 designs
could save $5 in amortized design cost, but would
almost certainly require such generality in the
SOC that SOC production costs would increase
by far more than $5. SOC designs need not be
completely universal—high-volume products
can easily afford to have a chip-level design
platform that is appropriate to their application
domain, yet flexible within it.
If designers have sufficient flexibility within
an SOC to adapt to any tasks they are likely
to encounter during that design’s lifetime,
then they essentially have all the relevant benefits
of universal flexibility without much of the
overhead of universal generality. If the platform
design is done correctly, the cost for this application-specific
flexibility is much lower than the flexibility
derived from a truly universal device such as
an FPGA and a high-performance, general-purpose
processor.
In addition, a good design methodology should
enable as broad a population of hardware and
software engineers as possible to design and
program the SOCs. The larger the talent pool,
the faster the development and the lower the
project cost.
The key characteristics for such an SOC design
methodology are:
- Support for concurrent processing,
- Appropriate
application efficiency, and
- Ease of development
by people who are not necessarily SOC design
specialists. (That’s not to
say that the people using this design methodology
to develop SOCs may not be IC-design specialists,
but that they are far more likely to be specialists
in the specific application domain of interest
to the design team.)
An Improved Design Methodology
for SOC Design
A fundamentally new way to speed
development of mega-gate SOCs is emerging.
First, processors replace hardwired logic to
accelerate hardware design and bring full chip-level
programmability. Second, those processors are
extended, often automatically, to run functions
very efficiently with high throughput, low power
dissipation, and modest silicon area. Blocks
based on extended processors often have characteristics
that rival those of the rigid RTL blocks they
replace. Third, these processors become the basic
building blocks for complete SOCs, where the
rapid development, flexible interfacing, and
easy programming and debugging of the processors
accelerate the overall design process. Finally,
and perhaps most importantly, the resulting SOC-based
products are highly efficient and highly adaptable
to changing requirements. This improved SOC design
flow allows full exploitation of the intrinsic
technological potential of nanometer semiconductors
(parallelism, pipelining, fast transistors,
application-specific operations) and the benefits
of modern software development methodology.
A sketch of the new SOC design flow appears
in Figure 4.

Figure 4. MPSOC design flow overview
The flow starts from the high-level requirements,
especially the external input and output requirements
for the new SOC platform and the set of tasks
that the system performs on the data flowing
through the system. The computation within tasks
and the communication among tasks and interfaces
are optimized using application-specific processors
and quick function-, performance- and cost-analysis
tools. The flow makes an accurate system model
available early in the design schedule, so detailed
VLSI and software implementations can proceed
in parallel. Early and accurate modeling of both
hardware and software reduces development time
and minimizes expensive surprises late in the
development and bring-up of the entire system.
Using this design approach means that designers
can move through the design process with fewer
dead ends and false starts and without the need
to back up and start over. It means that SOC
designers can make a much fuller and more detailed
exploration of the design possibilities early
in the design cycle. Using this approach, they
can better understand the design’s hardware
costs, application performance, interface, programming
model, and all the other important characteristics
of an SOC’s design.
Taking this approach to designing SOCs means
that the cone of possible, efficient uses of
the silicon platform will be as large as possible
with the fewest compromises in the cost and the
power efficiency of that platform. The more a
design team uses the application-specific processor
as the basic SOC building block—as opposed
to hard-wired logic written as RTL—the
more the SOC will be able to exploit the flexibility
inherent in a software-centric design approach.
The Configurable Processor as Building Block
The basic building block of this methodology
is a new type of microprocessor: the configurable,
extensible microprocessor core. These processors
are created by a generator that transforms high-level
application-domain requirements (in the form
of instruction-set descriptions or even examples
of the application code) into an efficient hardware
design and software tools. The “sea of
processors” approach to SOC design allows
engineers without microprocessor design experience
to specify, evaluate, configure, program, interconnect,
and compose those basic building blocks into
combinations of processors that together create
the essential digital electronics for SOC devices.
To develop a processor configuration using one
of these configurable microprocessor cores, the
chip designer or application expert comes to
the processor-generator interface (shown in Figure
5) and selects or describes the application source,
instruction-set options, memory hierarchy, closely-coupled
peripherals, and interfaces required by the application.
It takes about one hour to fully generate the
hardware design—in the form of standard
RTL languages, EDA tool scripts and test benches,
and the software-development environment (C and
C++ compilers, debuggers, simulators, RTOS code,
and other support software). The generation process
provides immediate availability of a fab-portable
hardware implementation and software development
environment. This timely delivery of the hardware
and software infrastructure permits rapid tuning
and testing of the software applications on that
processor design. The completeness of software
largely eliminates the issues of software porting
and enables rapid design iteration. The tools
can even be configured to automatically explore
a wide range of possible processor configurations
from a common application base to reveal more
optimal hardware solutions, as measured by target
application requirements.
Figure 5. Basic Processor Generator Flow
The application-specific processor performs
all of the same tasks that a microcontroller
or a high-end RISC processor can perform: run
applications developed in high-level languages;
implement a wide variety of real-time features;
and support complex protocols stacks, libraries,
and application layers. Application-specific
processors perform generic integer tasks very
efficiently, even as measured by traditional
microprocessor power, speed, area, and code-size
criteria. But because these application-specific
processors are able to incorporate the data paths,
instructions, and register storage for the idiosyncratic
data types and computation required by an embedded
application, they can also support virtually
all of the functions that chip designers have
historically implemented as hard-wired logic.
The Transition to MPSOC Design
The transition from conventional SOC design
methodology to a new multiple-processor SOC (MPSOC)
design methodology offers two fundamental benefits.
First, it brings flexibility and speed of design
to traditionally hardwired functions. The performance
and energy efficiency of configurable processors
far exceed that of conventional processors and
rivals capability of hardwired logic functions,
but with simpler, faster initial design and thorough
post-silicon programmability. For many design
teams, automatic generation of processors is
displacing RTL design for their more complex,
data-intensive function blocks. This transition
has greatest impact in SOC applications where
system complexity is on a collision course with
bandwidth or bandwidth efficiency. Second, the
more pervasive use of configurable processors
as the basic building-block of choice simplifies
system-level design. Across all the functions
implemented with application-specific processor
configurations, hardware and software teams use
one set of hardware interfaces, software tools,
simulation models and debug methods. This unification
reduces design time, misunderstandings between
hardware and software developers, and risk of
failure.
Note: This Tensilica White Paper is based on
the book Engineering the Complex SOC by Chris
Rowen, to be published in June 2004 by Prentice
Hall.
|