ConnX BBE64 Family

ConnX BBE64 (Baseband Engine) Family

64-128 MAC/Cycle High-Performance DSP

\'Read Microprocessor Report\'s

Features

  • High-performance DSP with 64 or 128 simultaneous 18x18-bit MACs/cycle supporting more than 100 billion high-precision multiply/accumulate operations per second
  • Supports a rich variety of complex arithmetic operations and efficient matrix processing with SIMD acceleration
  • 16-way radix-8 FFT butterfly steps, 16-32 complex tap FIR or 64-128 real tap FIR operations (256 real symmetric FIR taps) per cycle
  • Instruction set optimized for high-performance OFDM and MIMO based communication baseband designs
  • Wide vector processing pipeline with up to 32-way parallel SIMD operations and 4-issue VLIW for efficient parallel load/store and compute operations
  • Parallel 640b wide vector register files with support for 1b x 640, 20b x 16 and optionally 10b x 64 vector types
  • Dual 512-bit load/store units
  • Based on the Xtensa LX platform with rich customization and extension capabilities
  • Extensible interfaces with customized FIFO, port and lookup interfaces

Benefits

  • Up to 128 GMACs/sec DSP performance
  • High I/O throughput with the ability to create custom interfaces to hardwired custom coprocessors and RTL blocks
  • Efficient support of scalar vector and matrix operations for real and complex data
  • High efficiency allows operation at low clock speeds for low-power designs
  • Comprehensive Tensilica multicore tools infrastructure enables use in high-performance cellular base station applications

High Performance DSPs for LTE Advanced

The ConnX BBE64 Baseband Engine family is based on an ultra-high performance DSP designed for use in next-generation communication baseband processors in LTE Advanced and other next-generation 4G cellular radios and multi-standard broadcast receivers. The high computation requirements in such applications require new, innovative architectures with a high degree of parallelism and efficient I/O. The ConnX BBE64 family meets these needs by combining a 32-way SIMD, 4-issue VLIW processing pipeline with a rich and extensible set of interfaces. With the option to add 64 extra multipliers, the ConnX BBE64-128 offers up to 128 MACs/cycle.

The ConnX BBE64 family is built around a set of versatile pipelined execution units including flexible precision real and complex multiply-add, adders, bit manipulation, shift and normalization, select, shuffle and interleave units.  The results of all these operations can be extended precision of 40 bits per component or truncated/rounded/saturated and shifted to meet the needs of different algorithms and implementations.

The optional accelerated FIR unit offers even higher performance for a wide range of filtering tasks, including complex data (real coefficient at 64 taps/cycle and complex coefficient at 32 taps/cycle) and real data (symmetric real coefficient at 256 taps/cycle and asymmetric real coefficient at 128 taps/cycle).

The ConnX BBE64 architecture is code compatible with ConnX BBE16 and further expands on the Boolean predication architecture of ConnX BBE16.  This enables the compiler to achieve high throughput with vectorization even on complex functions with conditional operations embedded in the inner loops.

The ConnX BBE64 family supports programming in C with a vectorizing compiler.  Automatic vectorization of scalar C and full support for vector datatypes allows the development of algorithms without the need to program at the assembly level. Native C operator overloading is supported for natural programming with standard C operators on real and complex vector data types.

Two Processors in BBE64 Family

The BBE64 family includes an extensive feature set specifically optimized for LTE Advanced wireless. To meet the needs of handsets and infrastructure, Tensilica created two processors, both of which can be further tailored to meet application requirements:

  • BBE64-128-For infrastructure applications, this high-performance processor can perform at 128 MACs/cycle. It uses the option for a second slot of 64 multipliers, which is particularly helpful for FIR filters and matrix operations, required by LTE Advanced macrocells.
  • BBE64-UE-for user equipment (handsets), this is the power-optimized, efficient version with a minimum feature set  and smaller pipeline for minimum energy and latency. Ideal for interface with low-power specialized engines, this high-efficiency processor can reach approximately 300 GMACs/second/Watt in 28nm low-leakage process technology.

Instruction Set

The ConnX BBE64 processors are options for the popular Xtensa dataplane processor (DPU). The power of the ConnX BBE64 architecture comes from a comprehensive DSP and baseband instruction set with over 100 instructions.

A wide variety of load/store operations supports six different addressing modes with support for 16b/32b scalar and vector data types. Unaligned Load/Stores with masking deliver full bandwidth Loads and Stores for unaligned data. Vector data management is supported with data packing and shifting.

Multiply operations include complex and scalar 18bx18b multiply, multiply-round and multiply-add functions. Complex-number functions include support for conjugate arithmetic and magnitude operations as well as full precision arithmetic and saturated/rounded outputs. The ConnX BBE64 is capable of performing up to 128 multiplies per operation. BBE64 includes extended precision with guard bits on all register data and full support of double precision data, 40-bit accumulation on MAC operations without performance penalty. A wide variety of arithmetic, logical and shift operations are supported for up to 96 operations per cycle. There is full support for matrix multiplication for both packed and component data representation with acceleration for OFDM matrix operations.

The ConnX BBE64 directly supports 16-way radix-4 and radix-8 FFT butterly steps, 16-32 complex tap FIR or 64-128 real tap FIR operations (256 real symmetric FIR taps) per cycle. The instruction set efficiently implements odd-radix DFTs commonly found in LTE and LTE-Advanced.

For further application acceleration optional instruction packages are available for 32-way SIMD integer and fractional divide, 16-way SIMD reciprocal square root and de-spreading functions (64 complex MACs/cycle).

Extensibility

BBE64 supports custom ports (general purpose wire interfaces) and queue (FIFO) interfaces for efficient connection to coprocessors. These custom interfaces can be defined to match the interfaces of existing RTL hardware blocks. Buffered communication between two ConnX BBE64s or between a ConnX BBE64 and an RTL block can be automatically implemented using queue interfaces and are fully supported in programming and modeling tools.

Multiple parallel local memories can be connected directly to a ConnX BBE64 DSP using the Lookup interface, allowing more than 32 independent memory references per cycle. The ConnX BBE64 also can be further extended by defining new instructions, registers, and execution units to augment the existing instruction set.

Toolchain

A complete set of tools are available to support the ConnX BBE64. A comprehensive instruction set simulator (ISS) allows developers to quickly simulate and evaluate performance.  The fast, functional  TurboXim simulator option achieves speeds that are more than 40 times faster than the ISS for efficient software development and functional verification.  System C (XTSC) and C-based (XTMP) system modeling can aid in full-chip simulations.

The toolset includes a high-performance C/C++ compiler with automatic vectorization to support the VLIW pipeline in the BBE64 core. This comprehensive tool set also includes the linker, assembler, debugger, profiler, an energy estimation tool and graphical visualization tools. All major back-end EDA flows are supported.

Marketing Agency