ConnX BSP3 Dataplane Processor

ConnX BSP3 Dataplane Processor

Optimized Bit Stream Processor for Multi-standard Wireless Baseband PHY

Features

  • High-performance bit manipulation, with specific acceleration operations:
    • Bit Insertion / Extraction
    • Variable Bit Insertion / Extraction
    • Vectored Bit Selectors
    • Bit Stream Writer
    • Bit Stream Reader
  • 3-slot VLIW for efficient parallel load/store and bit compute operations
  • Dual 32b load/store unit supports configurations with up to 4MB addressable region
  • Architecture optimized for 16-bit, 20-bit, 32-bit and 40-bit vector operations
  • Advanced compiler technology for vectorizing C-code and FLIX allocation, giving C-programming model
  • C operator overloading
  • 128-bit wide vector files, allowing the loading, computation and storing of four 32-bit words, eight 16-bit words , or sixteen 8-bit words at a time.
  • Based on the Xtensa LX platform with rich customization and extension capabilities
  • Extensible interfaces with custom-designed Queue, Port and Lookup interfaces
  • Further tailor instruction set to meet the application needs

Benefits

  • Ability to off-load bit manipulation operations onto dedicated processor
  • Very high performance for small area and low power for bit computation
  • High performance in loading, computing and storing multiple blocks of data
  • High performance computation of CRC, interleavers, scramblers in the bit processing sections of LTE/4G PHY systems
  • Ease of development with C code programming model
  • Easy integration into existing hardware systems as well as multi-core systems
  • High I/O throughput with the ability to create custom interfaces to hardwired custom coprocessors / RTL blocks

Small Size, Low Power VLIW Processor for Wireless Baseband Bit Computation

The ConnX BSP3 (Bit Stream Processor) is a high-performance Dataplane Processor Unit (DPU) designed for use in SOC designs for next-generation communication baseband PHY systems such as those found in LTE and HSPA+ cellular radios and multi-standard broadcast receivers. It is specifically optimized for processing and manipulating bit streams, including operations for CRC, interleavers, scramblers and more.

The high compute requirements of bit processing and manipulation of 3.9G and 4G PHY systems require new and innovative architectures with a high degree of parallelism and efficient  I/Os. The ConnX BSP3 meets these needs by offering an architecture and optimized instruction set with the parallel execution of a 3 issue VLIW machine. The dual 32-bit wide data path combined with the 3-issue VLIW allows single-cycle load, computer and store. Additionally, it can load four vectors in one cycle. This is all done in a small size processor giving very high performance per area and power.

The ConnX BSP3 is an integral part of the Atlas Reference Architecture along with the ConnX BBE16 DSP core and the ConnX BSP3 and ConnX Turbo16 DPUs.

Instruction Set

The ConnX BSP3 is built on the baseline Xtensa RISC architecture, which implements a rich set of generic instructions optimized for efficient embedded processing. The power of the ConnX BSP3 comes from an optimized set of instructions for bit manipulation coupled with VLIW parallel execution of instructions. In the ConnX BSP3 bit manipulation occurs on a 32-bit register array, so it is possible to perform four simultaneous bit manipulation operations on 8-bit data and two simultaneous operations on 16-bit data.

Specific bit manipulation instructions are available for LTE baseband computation. Variable bit insertion/extraction can insert/extract one or more bits into/from 8-, 16- or 32-bit words. The Vector Bit Selector operates on four vectors in parallel; each are two 8-bit inputs and select bits to create an 8-bit output. The Bit Stream Reader and Writer reads and writes variable length words into a serial stream, with full synchronization of loading and sending.

Programming Model

ConnX BSP3 was designed to get maximum performance from standard C code. Tensilica's advanced compiler automatically vectorizes code and performs automatic data alignment.

Looping delays are kept to a minimum and zero overhead looping is used. General arithmetic and logical operations can be automatically mapped to the SIMD engine by using operator overloading to give optimal performance. C intrinsics can be used for algorithms that can not be automatically vectorized.

Multi-core Integration

ConnX BSP3 is a building block within the Atlas Reference Architecture. As a result the integration and support of multi-core systems is a key factor in the definition of ConnX BSP3 DPU. Connectivity to other cores can be supported by a memory mapped AMBA AXI interface for a shared bus architecture. Also the Tensilica PIF interface can be used as a shared bus or a point-to-point connection scheme offering better performance on high data rate data paths as well as lower power. Control and synchronization can be done via PIF as well as Queue (FIFO) and  Port (GPIO) interfaces.

These multi-core systems are all supported in the Tensilica development tool chain, with multi-core system simulation, profiling and debug.

Extensibility

The ConnX BSP3 supports custom defined Port and Queue interfaces. The Ports can be defined to match the interfaces of existing RTL hardware blocks. Buffered communication between two Connx BSP3 DPUs or between a ConnX BSP3 and an RTL block can be automatically implemented using Queue interfaces and are fully supported in programming and modeling tools.

Local memories can be connected directly to the ConnX BSP3 using the Lookup interface, bypassing the processor memory bus. This efficiently implements functions that require storage of multiple intermediate datasets. The ConnX BSP3 also can be modified and extended by defining new instructions, registers, and execution units to argument the existing instruction set.

Toolchain

A complete set of tools are available to support the ConnX BSP3. A comprehensive instruction set simulator (ISS) is included as part of the Xplorer IDE, which allows developers to quickly simulate and evaluate performance.  The fast, functional  TurboXim simulator option achieves speeds that are 40 to 80 times faster than the ISS for efficient software development and functional verification.  System C (XTSC) and C-based (XTMP) system modeling at the transaction level and pin level can aid in full-chip simulations.

The toolset includes a high-performance C/C++ compiler with automatic vectorization to support the VLIW pipeline in the SBP16. This comprehensive tool set also includes the linker, assembler, debugger, profiler, an energy estimation tool and graphical visualization tools. All major back-end EDA flows are supported.


Marketing Agency