Design SOC Hardware with TIE (the Tensilica Instruction
Extension language), not RTL
In traditional, RTL-based design methodologies,
system designers seeking to implement a new function
start with a specification and begin capturing
that specification in a hardware description language
such as Verilog or VHDL. That transformation process
usually includes analyzing the bandwidth and latency
constraints of the function in the given application
and then determining the computational elements
- or datapath - needed to deliver the desired performance
given the chip’s design constraints (gate
count, power dissipation, etc.). The design approach
also requires the creation of a state machine (also
using an HDL) to sequence the data through the
custom datapath elements in a manner that executes
all the necessary permutations of the desired algorithms.
Designing a new functional block using the Xtensa
processor and TIE extensions follows much the same
process. The Xtensa processor’s various bus
interfaces scale from 32 to 128 bits to provide
I/O bandwidth exceeding 40Gbit/s in 0.13 micron
IC-fabrication processes. Designer-defined execution
units of very high complexity – including
multiple independent operations, SIMD parallelism,
and multi-cycle execution - can be implemented
in TIE in much the same fashion as a designer would
define the datapath of an RTL block. Advanced software
pipelining techniques can also be used to combine
data load/store operations with computational operations
to achieve continuous computation without the added
latency of serialized load-compute-store sequences.
Adding TIE instruction to a Tensilica processor core never compromises the underlying base Xtensa instruction set, thereby ensuring availability of a robust ecosystem of third party application software and development tools. All configurable, extensible Xtensa processors are always compatible with major operating systems, debug probes and ICE solutions; and always come with an automatically generated, complete software development toolchain including an advanced integrated development environment based on the ECLIPSE framework, a world-class compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the full industry-standard GNU toolchain.
Faster Verification, Lower Risk
Once the Xtensa processor datapath additions have
been designed using TIE, implementation of the
target algorithm occurs by writing firmware to
sequence the data through the tailored processor.
While RTL datapath hardware and TIE-built execution
units can deliver similar levels of parallelism
and performance, the true advantage of the TIE-based
methodology is in the design and debug of the control
algorithm.
RTL design methods implement the algorithm using
hardwired gates. A design method based on the Xtensa
processor implements the algorithm as firmware.
The approached based on the Xtensa processor offers
many advantages. Software running on the Xtensa
instruction set simulator (ISS) simulate at speeds
of a million cycles per second on a simple PC host.
Gates in a hard-coded state machine built in Verilog
or VHDL must be verified at the RTL or netlist
level using simulation environments that run at
only tens of cycles per second. In addition, the
difficult-to-design state-machine logic required
when using an RTL-based design approach is automatically
generated by the Xtensa Processor Generator. The
Xtensa processor’s control logic is pre-verified
and correct by construction. Consequently, verification
of systems based on Xtensa processors requires
much less time.
Bugs found in the firmware running on an Xtensa
processor can be repaired in minutes with a simple
software change. Bugs found in hardwired gates
after chip tapeout require million-dollar mask-set
changes and months of delay to fabricate new silicon.
Replace months of HDL design simulation before
tape-out and a multi-million-dollar, multi-month
silicon re-spin path to fix bugs with Tensilica’s
faster, more comprehensive software simulation
environment with a post-silicon “bug fix” cycle
that entails only a 10-minute re-compilation of
firmware code.
Use Processors, Not Gates
For an excellent example of how a configurable
processor can be extended, see the article MPEG-4
is accelerated and footprint reduced by use of
a configurable processor core.
|