HiFi 2 Audio DSP Product Brief
Cut DSP Development Time - Get High Performance From C, No Assembly Required
Optimizing a DSP Architecture for Wireless Baseband
A Designers Guide to HD Video Pre- and Post-Processing
Put Low-Power, Low-Overhead, High-Fidelity Digital Sound in Your Next ASIC or SOC
See our complete white paper library
Tensilica DSP Targets LTE Advanced - Microprocessor Report review of ConnX BBE64
Tensilica Plays Baseband - New ConnX Core Aims for Low-Power Wireless Communications - Microprocessor Report review of ConnX BBE16
Tensilica Xtensa LX Processor with Vectra LX - BDTI
The ConnX BSP3 (Bit Stream Processor) is a high-performance Dataplane Processor Unit (DPU) designed for use in SOC designs for next-generation communication baseband PHY systems such as those found in LTE and HSPA+ cellular radios and multi-standard broadcast receivers. It is specifically optimized for processing and manipulating bit streams, including operations for CRC, interleavers, scramblers and more.
The high compute requirements of bit processing and manipulation of 3.9G and 4G PHY systems require new and innovative architectures with a high degree of parallelism and efficient I/Os. The ConnX BSP3 meets these needs by offering an architecture and optimized instruction set with the parallel execution of a 3 issue VLIW machine. The dual 32-bit wide data path combined with the 3-issue VLIW allows single-cycle load, computer and store. Additionally, it can load four vectors in one cycle. This is all done in a small size processor giving very high performance per area and power.
The ConnX BSP3 is an integral part of the Atlas Reference Architecture along with the ConnX BBE16 DSP core and the ConnX BSP3 and ConnX Turbo16 DPUs.
The ConnX BSP3 is built on the baseline Xtensa RISC architecture, which implements a rich set of generic instructions optimized for efficient embedded processing. The power of the ConnX BSP3 comes from an optimized set of instructions for bit manipulation coupled with VLIW parallel execution of instructions. In the ConnX BSP3 bit manipulation occurs on a 32-bit register array, so it is possible to perform four simultaneous bit manipulation operations on 8-bit data and two simultaneous operations on 16-bit data.
Specific bit manipulation instructions are available for LTE baseband computation. Variable bit insertion/extraction can insert/extract one or more bits into/from 8-, 16- or 32-bit words. The Vector Bit Selector operates on four vectors in parallel; each are two 8-bit inputs and select bits to create an 8-bit output. The Bit Stream Reader and Writer reads and writes variable length words into a serial stream, with full synchronization of loading and sending.
ConnX BSP3 was designed to get maximum performance from standard C code. Tensilica's advanced compiler automatically vectorizes code and performs automatic data alignment.
Looping delays are kept to a minimum and zero overhead looping is used. General arithmetic and logical operations can be automatically mapped to the SIMD engine by using operator overloading to give optimal performance. C intrinsics can be used for algorithms that can not be automatically vectorized.
ConnX BSP3 is a building block within the Atlas Reference Architecture. As a result the integration and support of multi-core systems is a key factor in the definition of ConnX BSP3 DPU. Connectivity to other cores can be supported by a memory mapped AMBA AXI interface for a shared bus architecture. Also the Tensilica PIF interface can be used as a shared bus or a point-to-point connection scheme offering better performance on high data rate data paths as well as lower power. Control and synchronization can be done via PIF as well as Queue (FIFO) and Port (GPIO) interfaces.
These multi-core systems are all supported in the Tensilica development tool chain, with multi-core system simulation, profiling and debug.The ConnX BSP3 supports custom defined Port and Queue interfaces. The Ports can be defined to match the interfaces of existing RTL hardware blocks. Buffered communication between two Connx BSP3 DPUs or between a ConnX BSP3 and an RTL block can be automatically implemented using Queue interfaces and are fully supported in programming and modeling tools.
Local memories can be connected directly to the ConnX BSP3 using the Lookup interface, bypassing the processor memory bus. This efficiently implements functions that require storage of multiple intermediate datasets. The ConnX BSP3 also can be modified and extended by defining new instructions, registers, and execution units to argument the existing instruction set.
A complete set of tools are available to support the ConnX BSP3. A comprehensive instruction set simulator (ISS) is included as part of the Xplorer IDE, which allows developers to quickly simulate and evaluate performance. The fast, functional TurboXim simulator option achieves speeds that are 40 to 80 times faster than the ISS for efficient software development and functional verification. System C (XTSC) and C-based (XTMP) system modeling at the transaction level and pin level can aid in full-chip simulations.
The toolset includes a high-performance C/C++ compiler with automatic vectorization to support the VLIW pipeline in the SBP16. This comprehensive tool set also includes the linker, assembler, debugger, profiler, an energy estimation tool and graphical visualization tools. All major back-end EDA flows are supported.