ConnX BBE32UE

ConnX BBE32UE

Very-Low Power for User Equipment Applications (LTE-Advanced and HSPA+)

The ConnX BBE32UE Baseband Engine is a high-performance DSP designed for use in the next-generation communication baseband processors in LTE-Advanced and multi-standard User Equipment PHY (Layer 1) systems. As PHY system developers move from LTE to LTE-Advanced, they face the challenge of up to 5x performance increase with a very low power budget. Couple this with new algorithms, field testing and demand for fast time to market, and developers are looking more to DSP cores to offer the flexibility and fast development they need, but at very low power consumption.

The ConnX BBE32UE is built around a core vector pipeline made of 32 MACs. The 16b x 16b multipliers with singed, unsigned support and associated adder and multiplexer trees enable operations such as Matrix computation, parallel complex multiple operations and signal filter structures.   The results of these operations can be full precision or truncated/rounded/saturated and shifted to meet the needs of different algorithms and implementations. High precision is a key factor with ConnX BBE32UE and, as a result, more signed multiplication results can be accumulated without loss of precision, with fewer register spills and lower power.

The ConnX BBE32UE supports programming in C with a vectorizing compiler. Automatic vectorization of scalar C and full support for vector data types allows the development of algorithms without the need to program at the assembly level. Native C operator overloading is supported for natural programming with standard C operators on real and complex vector data types.

Features

  • High-performance, very-low power DSP core with 32 simultaneous MACs/cycle
  • DSP architecture and instruction set specifically optimized for wireless communications user equipment applications
  • Supports a rich variety of complex arithmetic operations and efficient matrix processing with SIMD acceleration
  • Wide vector processing pipeline with up to 16-way SIMD support and 3-issue VLIW for efficient parallel load/store and compute operations
  • 320b wide vector register files supporting for 20b x 16 and 40b x 8 vector types
  • 256b load/store unit and 256b load unit
  • 10-stage DSP pipeline architecture
  • Based on the Xtensa LX platform with rich customization and extension capabilities
  • Extensible interfaces with customized FIFO, Port and Lookup interfaces
  • Optional acceleration units available:
    • 16-way SIMD integer and fractional divide
    • 8-way SIMD reciprocal square root
    • De-spread (32-way), including Hadamard transforms
    • 3GPP soft bit demapping
    • LFSR generation
  • Optional support for non-aligned vector data load
  • Supported by Tensilica advanced compiler and world class development tool chain
  • DSP function and application specific function libraries
  • Part of the Baseband ConnX family of DSP cores

Benefits

  • Offers very low power consumption for user equipment applications; meets very low power budgets for LTE-Advanced user equipment PHY (Layer 1) systems
  • High I/O throughput with custom,, low-power interfaces to offload accelerators, single-cycle access to the ALUs
  • C-programming model; easy and quick software development
  • Efficient support of scalar vector and matrix operations for real and complex data
  • Comprehensive Tensilica multicore tools infrastructure enables use in high-performance cellular handset applications

Instruction Set

The instruction set and architecture has been optimized for user equipment applications for LTE-Advanced and multi-standard communications, and tuned to meet the performance and computation requirements of this market. This results in a smaller, much more energy efficient DSP core. The core is also better suited to User Equipment modem system integration. For example the large 1K and 2K FFT algorithms are generally run in offload accelerators. ConnX BBE32UE is optimized assuming this type of offloading.

A wide variety of load/store operations supports seven different addressing modes with support for 16b/32b scalar and vector data types. The option to add unaligned load/stores with masking delivers full bandwidth loads and stores for unaligned data. Vector data management is supported with data packing and shifting.

Multiply operations include complex and scalar 17b x 17b multiply, multiply-round, multiply-add and multiply- subtract functions. Complex-number functions include support for conjugate arithmetic and magnitude operations as well as full precision arithmetic and saturated/rounded outputs. The ConnX BBE32UE includes extended precision with guard bits on all register data and full support of double precision data, 40-bit accumulation on all MAC operations without performance penalty. A wide variety of arithmetic, logical and shift operations are supported for up to 16 data words per cycle. There is full support for matrix multiplication with acceleration for OFDM matrix operations.

For further application acceleration, optional instruction packages are available offering 16-way SIMD integer and fractional divide, as well as a 8-way SIMD reciprocal square root. There is also a de-spread acceleration package using 16 complex MACs/cycle that includes Hadamard transforms. For 3GPP applications, option packages for soft bit demapping and LFSR generation are available.

Extensibility

ConnX BBE32UE supports custom Ports (general purpose wire interfaces) and Queue (FIFO) interfaces for efficient connection to offload accelerators. These custom interfaces can be defined to match the interfaces of existing offload accelerators. Buffered communication between two ConnX BBE32UE cores or between a ConnX BBE32UE and an offload accelerator can be automatically implemented using Queue interfaces and are fully supported in programming and modeling tools. These interfaces are dedicated to the offload accelerator and single cycle access. This is specifically important for user equipment applications as many functions may be moved to an offload accelerator. Thus, ConnX BBE32UE can access these offload accelerators in a single cycle, cycle deterministic operation, greatly reducing power consumption.

Local memories can be connected directly to a ConnX BBE32UE DSP using the Lookup interface, bypassing the processor memory bus. This allows efficient implementation of functions that require storage of multiple intermediate datasets. The ConnX BBE32UE also can be modified and extended by defining new instructions, registers, and execution units to augment the existing instruction set.

Toolchain

A complete set of tools are available to support the ConnX BBE32UE. A comprehensive instruction set simulator (ISS) allows developers to quickly simulate and evaluate performance.  The fast, functional  TurboXim simulator option achieves speeds that are 40 to 80 times faster than the ISS for efficient software development and functional verification.  System C (XTSC) and C-based (XTMP) system modeling can aid in full-chip simulations. Pin Level XTSC offers joint simulation of SystemC and RTL level offload accelerator blocks for fast, cycle accurate simulations.

The toolset includes a high-performance C/C++ compiler with automatic vectorization to support the VLIW pipeline in ConnX BBE32UE. This comprehensive tool set also includes the linker, assembler, debugger, profiler, an energy estimation tool and graphical visualization tools. All major back-end EDA flows are supported.

Marketing Agency