Baseband & RF Signal Processing
Tackling the Hard Tasks in the Dataplane
When you need to put the WOW into your SoC designs, look to Cadence and our Tensilica DPUs. We offer more ways to perform complex signal processing than any other company. Cadence offers a full range of DPUs and DSPs that can provide the best combination of high performance, low power, and small area, exactly tailored to your application.
From the lightweight dual-MAC ConnX D2 to the super-high-performance 64-MAC ConnX BBE64, these off-the-shelf, ready-to-run designs provide industry elading high performance, compact, low power engines for applications from SmartGrid to 802.11 AC modems and to LTE-Advanced. And we offer several special-function DPUs so you don't have to design these common functions, speeding your design effort.
Did You Know?
No matter what solution you choose, remember that it's based on our 32-bit Xtensa RISC processor and toolset. Unlike a traditional fixed-configuration DSP core, all Tensilica DSPs and DPUs are fully:
- Configurable - Select the pre-built functions you need with full C langauge, library and verification support.
- Extensible - Extend your instruction set by adding custom instructions using the TIE language. The results are automatically integrated into the programming tools with full verification support. Custom ports allow your hardware accelerators to be directly integrated into the core appearing to the programmer as a standard instruction.
- Scalable - Configurable I/O ports and memory allow you to easily scale your performance from a simple single-core design to a sophisticated direct ported multi-core solution.
Whether your need is for a single core, a homogeneous multi-core solution or a highly optimized heterogeneous mix of DSPs, DPUs and hardware accelerator blocks, our ConnX family of DSPs and DPUs supported by the Xtensa tool chain is the ideal place to get started on your baseband platform design.
In addition to our family of DSPs, we partner with leading system integration vendors to help you assemble a full system solution. See the System Solutions tab for more details.
Putting the WOW in Signal Processing
Cadence supports your development needs from complete system solution reference designs to full custom DSP/accelerator development based on the Xtensa framework. The ConnX DSPs form the basis of our Tensilica baseband processing reference designs, provide fully verified configurable DSPs for those customers who wish to develop their own baseband processing platforms, and can serve as a base for highly customized DSPs.
The ConnX BBE family of DSPs provide a full range of high-performance, low-power signal processing solutions. Beyond the coarse granularity between different 16/32/64 SIMD vector sizes, each specific BBE has a number of push button configurable options that allows the designer to optimize the core for the specific function at hand. In an infrastructure solution for example, the designer might opt for flexibility and enable all the options for a BBE32 resulting in a more powerful/flexible core. In a User Equipment solution, the system designer may put a greater emphasis on hardware accelerators and so may disable the corresponding options such as FIR Filtering.
Complementing the ConnX BBE DSPs are special function DPUs which target high runner algorithms that consume enough resources (often of a specialized type) to justify their own DPU engine. DPUs offer programmable, low-power, high-performance alternatives to hard-coded ASIC accelerators that otherwise would limit the overall flexibility of your system design.
ConnX BBE16 - full featured 16 MACs per cycle
The ConnX BBE16 Baseband Engine is a high-performance DSP that combines an 8-way SIMD, 3-issue VLIW processing pipeline with a rich and extensible set of interfaces. It is built around a core vector pipeline made of 16 18bx18b MACs. These multipliers and associated adder and multiplexer trees enable operations such as FFT butterflies, parallel complex multiple operations and signal filter structures. The results of these operations can be full precision or truncated/rounded/saturated and shifted to meet the needs of different algorithms and implementations
The instruction set has been optimized for performance of DSP kernel operations such as FFT and FIR as well as matrix multiplies. Acceleration has been added for a wide range of key wireless functions giving very high performance in wireless applications.
- Instruction set optimized for high-performance OFDM and MIMO based communication baseband designs
- Single-cycle radix-4 FFT butterfly, 4 complex tap FIR and 16 real tap FIR operations
- Dual 128b load/store units
- High I/O throughput with the ability to create custom interfaces to hardwired custom coprocessors and RTL blocks
- Leading performance per area and power
- Optional SIMD Divide/Recip Sqrt, De-spread (8-way)
- Optimized DSP Kernel library
See the ConnX BBE16 Product Brief for more information
ConnX BBE32 - full-featured, flexible 32-MAC DSP
The ConnX BBE32 is specifically designed to support the needs of software baseband processing. It delivers the processing capacity needed in multi-user systems and has a broad instruction set specifically targeted at the full range of algorithms typical of 3G, 4G and Wi-Fi systems. This allows the system architect to minimize or even eliminate hardware accelerators from the processing chain, making the system not only more flexible, but also lower risk, faster time-to-market and with a longer platform life since most if not all changes can be implemented as software updates.
- High performance, low power over a broad range of algorithms including support for LTE and HSPA+
- 32 way MAC, 16 way ALU SIMD engines
- 32 bit scalar ALU
- 4-issue VLIW for parallel load/store, MAC and ALU ops
- Optimized instructions for:
- Complex arithmetic
- Polynomial evaluation
- Matrix multiplication
- Bit oriented operations
- Vector compression and expansion
- Predicated vector instructions
- Configurable instruction set with 10 predefined, pre-verified vector packages from FFTs, FIR to integer divide
See the ConnX BBE32 Product Brief for more information
ConnX BBE32UE - optimized, low-power 32-MAC DSP
As PHY system developers move from LTE to LTE-Advanced, they face the challenge of up to 5x performance increase with a very low power budget. The ConnX BBE32UE is built around a core vector pipeline made of 32 MACs. The 16b x 16b multipliers with singed, unsigned support and associated adder and multiplexer trees enable operations such as matrix computation, parallel complex multiple operations and signal filter structures. High precision is a key factor with ConnX BBE32UE and, as a result, more signed multiplication results can be accumulated without loss of precision, with fewer register spills and lower power.
The ConnX BBE32UE supports programming in C with a vectorizing compiler. Automatic vectorization of scalar C and full support for vector data types allows the development of algorithms without the need to program at the assembly level. Native C operator overloading is supported for natural programming with standard C operators on real and complex vector data types.
- Wide 10-stage vector pipeline with up to 16-way SIMD support and 3-issue VLIW for efficient parallel load/store and compute operations
- 256b load/store unit and 256b load unit
- 320b wide vector register files supporting 20b x 16 and 40b x 8 vector types
- Very low power consumption for LTE-Advanced handset PHYs
- High I/O throughput with custom, low-power interface to offload accelerators, single-cycle access to the ALUs
See the ConnX BBE32 Product Brief for more details
ConnX BBE64 - with throughput of 64 MACs per cycle
The ConnX BBE64 DSP is based on an ultra-high-performance architecture designed for LTE-Advanced and other next-generation 5G cellular radios and multi-standard broadcast receivers. It combines a 32-way SIMD, 4-issue VLIW processing pipeline with a rich and extensible set of interfaces.
Built around a set of versatile pipelined execution units, the ConnX BBE64 includes flexible precision real and complex multiply-add, adders, bit manipulation, shift and normalization, select, shuffle and interleave units. The results of all these operations can be extended precision of 40 bits per component or truncated/rounded/saturated and shifted to meet the needs of different algorithms and implementations.
- High-performance DSP with 64 simultaneous 18x18-bit MACs/cycle
- Advanced 10-stage pipeline architecture
- 16-way radix-8 FFT butterfly steps, 16-32 complex tap FIR or 64-128 real tap FIR operations (256 real symmetric FIR taps) per cycle
- Dual 512-bit load/store units
- Parallel register files for 10b, 20b and 40b data types gives more capabity for easier compilation and higher performance
- High I/O throughput with the ability to create custom interfaces to hardwired custom coprocessors and RTL blocks
- High efficiency allows operation at low clock speeds for low-power designs
See the ConnX BBE64 Product Brief for more information
Specialized Baseband DPUs
ConnX BSP3 - Bit stream processor
The ConnX BSP3 (Bit Stream Processor) is designed for use in baseband PHY systems found in LTE and HSPA+ cellular radios and multi-standard broadcast receivers. It is specifically optimized for processing and manipulating bit streams, including operations for CRC, interleavers, scramblers and more.
The ConnX BSP3 offers an architecture and optimized instruction set with the parallel execution of a 3-issue VLIW machine. The dual 32-bit wide data path combined with the 3-issue VLIW allows single-cycle load, computer and store. Additionally, it can load four vectors in one cycle. This is all done in a small size processor giving very high performance per area and power.
- Dual 32b load/store supports up to 4MB addressable region
- Optimized for 16-, 20-, 32- and 40-bit vector operations
- 128-bit-wide vector files, allowing the loading, computation and storing of four 32-bit words, eight 16-bit words , or sixteen 8-bit words at a time
- Very high performance for small area and low power for bit computation
See the ConnX BSP3 Product Brief for more information
ConnX SSP16 - Soft stream processor
The ConnX SSP16 (Soft Stream Processor) is specifically optimized for processing streams of soft bits, which are 4- to-8 bit representations of transmitted bits. Soft bits are generated by the demodulator in the receive chain and used in HARQ pre-processing and header decoding. The ConnX SSP16 meets these needs by combining a 16-way SIMD, 3-slot VLIW processing pipeline optimized for 10-bit and 8-bit processing (10-bit supports the required precision for multiple operations on 8-bit data).
The dual 128-bit wide data path allows 16-way loading and operations for higher performance. The ConnX SSP16 also supports specialized functions such as the transpose memory module and the Viterbi accelerator module.
- Supports 3-bit, 8-bit, and 16-bit scalar data types and 8-bit vector data types that use 10-bit internal representation per element providing two guard bits
- Extensible interfaces with custom designed Port, Queue and Lookup interfaces
- Dual 128-bit load/store unit supports up to 4MB addressable region
- Optimized for small size and low power
See the ConnX SSP16 Product Brief for more information
ConnX Turbo16MS - Multistandard turbo processor
The ConnX Turbo16MS is a high-performance dataplane processor unit (DPU) specifically designed for decoding of LTE Turbo codes on data streams of up to 150 Mbps and HSPA+ data streams of up to 85 Mbps. This performance is required for 3.9G and 4G cellular radios and multi-standard broadcast receivers.
ConnX Turbo16MS has been optimized in two areas. First, a customized instruction set has been developed for LTE and HSPA+ turbo decoding. Second, it uses parallel execution for very high data bandwidth computation. This includes the 5-issue VLIW capability and the two load/store units that allow loading of dual memories in a single cycle. There are also 23 very tightly coupled scratch pad memories for storing a priori and state values that are accessed by instructions in parallel. This results in up to five memory accesses per cycle. Only this level of parallelism can give ConnX Turbo16MS the performance needed for multi-standard turbo decoding.
- Dual 128-bit load/store units
- LTE turbo decoding of up to 150 Mbps data streams with eight full iterations
- HSPA+ turbo decoding of up to 85 Mbps data streams with eight full iterations
- Small size and low power
The ConnX Turbo16MS provides multi-standard Turbo Decoding common to DTV, cdma2000, W-CDMA and LTE. Typically the system designer would be forced to implement this decoder in an ASIC since it would typically consume or exceed the resources on a DSP. The ConnX Turbo16MS enables a fully programmable decoder in a smaller package at lower power than can be provided using a generalized programmable DSP.
See the ConnX Turbo16MS Product Brief for more information
ConnX D2 2-MAC DSP Engine
The ConnX D2 option adds dual 16-bit multiply-accumulate (MAC) units and a 40-bit register file to the base RISC architecture of the Xtensa LX processor. The ConnX D2 engine utilizes two-way SIMD instructions to provide high performance on DSP algorithms. It also delivers dual-MAC performance using 64-bit VLIW (very long instruction word) instructions for code that cannot be vectorized.
- Both SIMD and 2-way FLIX (parallel VLIW) operations
- One or two load/store units
- High-performance DSP instruction set
- Dual write ports computer up to three results/cycle
- Supports TI (C6x) and ITU-T C intrinsic code base
- Bit-for-bit compatible with TI C6x code
- Optimized, vectorizing XCC Xompiler
- C-centric programming model supports standard 16-, 32- and 40-bit data types
- Low area/cost - Can be less than 70K gates
- Optimized DSP kernel library
See the ConnX D2 Product Brief for more details.
ConnX Vectra DSP Engine
The ConnX Vectra DSP is a quad-MAC powerhouse that's ideal for wireless applications. It uses 64-bit instruction words containing three issue slots for ALU, multiply-accumulate, and load/store operations.
- 4-way 18x18-bit MAC architecture
- 64-bit FLIX instructions provide an efficient VLIW/SIMD hybrid architecture
- General DSP instruction set
- 4, 8 or 160 SIMD operations per cycle
- Optional second load/store unit provides up to 22 GB/s at 680 MHz
- Large vector register file for up to 122 GB/sec bandwidth
- Optional 4x40-bit or 8x20-bit vector operations (8-way MAC) with Vecytra VMB
- Optimized DSP kernel library
The following chart illustrates how the ConnX Vectra DSP really accelerates FFT performance.
256pt FFT (Radix-4) Performance Improvement
|Minimal configuration Xtensa LX using software multiply||155,398|
|Base Xtensa LX with MUL32 option||23,633|
|Xtensa LX with ConnX Vectra DSP option||994|
Cadence is committed to helping you get your new design to market as quickly as possible. To that end, we have created the Atlas LTE reference design and we have partnered with mimoOn for LTE-Advanced PHY software.
The Atlas LTE Reference Architecture Jump Starts Your LTE Design
The Atlas LTE reference architecture implements the complete 3GPP Long Term Evolution (LTE) layer 1 PHY - including the computationally demanding Turbo decoder - in a completely processor based, fully programmable DSP core reference architecture. The Atlas reference architecture implements a fully programmable SDR, all controlled by software. All of the processors involved use the same easy software development, debug and simulation environment. You can easily partition your algorithm into these cores with simple synchronization.
The ConnX Atlas reference architecture is intended as a starting point for design teams implementing LTE baseband systems. A design team will integrate the Atlas components together with the Layer 2 design elements and system interconnect elements of the design team's choosing. The components of the Atlas architecture are modular. A designer may opt to deploy all or just some of these processors. Or a design team may opt to re-use pre-existing RTL blocks in lieu of one or more of the Atlas components.
See your local Tensilica representative for more information on the Atlas LTE Reference Architecture.
Comprehensive LTE-Advanced HW/SW PHY IP Solution
We partnered with mimoOn for the only comprehensive licensable IP solution for LTE-Advanced chip designs. Cadence is now the exclusive DSP IP vendor for mimoOn's LTE UE and eNodeB PHY software products. Read more.
Customize Your Signal Processing DPUs
See some interesting ideas, but want something slightly different? That's the beauty of Cadence's approach to IP design. From the start, we designed our IP to be customizable. We used that same technology to create these innovative baseband IP cores.
- Ultra-low power consumption and size, with optimized cores that reduce required system clock frequency
- Flexibility - scalable platform to fit all performance, power and area budgets that can be further customized to meet your needs
- Reduced development cost and development risk - all programmable in C - backed by a world-class development tool suite and multi-core support
- A low-risk solution with a large ecosystem supporting all Tensilica products.
We recommend two approaches to get you quickly to the exact product you need:
- Start with one of our standard ConnX products and modify it. This will save a lot of design work and effort.
- Start with our Xtensa LX processor and a clean sheet. Design everything just the way you'd like it.
For digital signal processing applications, with unique datapaths, processing requirements, algorithms, and memory requirements, this customization process is often essential to get the smallest, most energy-efficient core possible.
Either way, our automated tools will help you through the design process, making sure the design is correct by construction and helping you make sure you get the right mix of power, performance and area. And when you're done, our automated Xtensa processor generator will make sure you get not only the hardware for your new design, but also a complete matching software tool chain.
Tensilica Offers a Scalable Platform for All Design Approaches
Accelerate Hot Spots in Applications
You don't have to go to higher MHz to get higher performance. By adding instructions in our Verilog-like language (TIE), you can accelerate hot spots in your applications. You can pump data through our cores with up to two 512-bit-wide data load/stores per cycle, or bypass the bus entirely with our unique GPIO and FIFO Queues. Here are some ways you can customize our DPUs:
- The width of data load/store, computation execution and register files can all be tailored to specific application
- Some application may greatly benefit from vectorizing computation through a SIMD machine
- The size of SIMD and vector "strides" can be customized to optimum performance per power/area for the application
- Create instructions that perform application specific tasks
- Create 'incredible performance' for application, reduce instruction memory footprint
Parallel instruction execution
- VLIW architecture to enable parallel computation of instructions
- Example: use one instruction to perform load, execute, store
See our Xtensa Processor section for more details.
Tools, Software, Libraries - We Have What You Need to Complete Your Design Quickly
For digital signal processing applications, with unique datapaths, processing requirements, algorithms, and memory requirements, Cadence's customization process is often essential to get the smallest, most energy-efficient core possible. No matter what changes you make, you'll find our tools and software will help you be more efficient.
For Processor Designers
Cadence delivers patented, proven tools that automate the process of generating a custom DSP or DPU along with matching software tools. These tools have been proven in hundreds of designs. Whether your design is for a simple controller or a complex multi-core DSP design, Cadence has the tools you need to create successful products.
View the complete set of tools for processor designers.
For Software Developers
When you need to develop your application software, the Xtensa Software Developer's Toolkit provides a comprehensive collection of code generation and analysis tools that speed the development process. Tensilica's Eclipse-based Xtensa Xplorer Integrated Development Environment (IDE) serves as the cockpit for the entire development experience.
View the complete set of tools for software developers.
Libraries and Existing DSP Code Base Support
We do everything we can to make it was easy as possible to port your existing DSP code to our DPUs. Our Xtensa C/C++ Compiler efficiently maps C algorithms to our DPUs, no assembly coding required.
We also provide a range of DSP libraries already tailored to our products, so you can speed your design process.
Learn More About Our Baseband DSPs and DPUs
Seriously considering using a ConnX DSP in your next SoC design but want to learn more? Here are some things you should explore:
|Title||File Size||Last Modified|
|ConnX BBE16 (Baseband Engine) Product Brief
The ConnX BBE16 Baseband Engine is a high performance 16-MAC, 8-way SIMD, 3-issue VLIW DSP designed for use in next-generation communication baseband processors in LTE and 4G cellular radios and multi-standard broadcast receivers.
|ConnX BBE32 Product Brief
The ConnX BBE32 is a high-performance 32-MAC, 16-way SIMD ALU-based DSP supporting hte full range of 4G and Wi-Fi baseband algorithms. The ConnX BBE32 is highly configurable with 10 vector options and provides a flexible, low-power platform for use in User Equipment and Infrastructure products.
|ConnX BBE32UE Product Brief
The ConnX BBE32UE is optimized for integration into a baseband processing chain that uses a mix of DSP and hardware-based offload accelerators. The ConnX BBE32UE provides a lower power, smaller die area solution than its more fully featured ConnX BBE32.
|ConnX BBE64 (Baseband Engine) DSP
The ConnX BBE64 Baseband Engine is targeted at the most advanced components and latest versions of LTE-Advanced and Wi-Fi with its 64-MAC, 32-way SIMD ALUs and 4-issue VLIW processing pipeline.
|ConnX BSP Bit-Stream Processor Product Brief
The ConnX BSP3 is a high-performance DPU optimized for processing and manipulation of bit streams, including operations for CRC, interleavers, scramblers and more.
|ConnX D2 DSP Engine Product Brief
The ConnX D2 option adds dual 16-bit multiply-accumulate (MAC) units and a 40-bit register file to the base RISC architecture of the Xtensa LX processor. The ConnX D2 engine utilizes two-way SIMD (single instruction, multiple data) instructions to provide high performance on vectorizable C code.
|ConnX SSP16 Soft Stream Processor
The ConnX SSP16 DPU is optimized for processing streams of 8-to-10-bit vectors typical of CDMA, LTE or Wi-Fi demodulators. It includes optional units for Viterbi decoding, demapping, and LFSR operations. It uses a 16-way SIMD, 3-slot VLIW pipeline optimized for 10- and 8-bit processing.
The ConnX Turbo16MS DPU is specifically designed for decoding LTE Turbo codes on data streams of up to 150 Mbps and HSPA+ data streams of up to 85 Mbps. It uses parallel execution for very high bandwidth computation.
|ConnX Vectra LX DSP Engine Product Brief
The ConnX Vectra LX DSP engine isthe 4-MAC member of Tensilica's DSP family, Ideal application areas include: smart meters, short-range wireless, broadband modems, broadcast demodulation, and wire-line communications.
Hardware/Software Design Tools
|Title||File Size||Last Modified|
|Xtensa Processor Developer's Toolkit Product Brief
Use the Xtensa Processor Developer’s Toolkit (PDK) to customize your Tensilica DPU. This Eclipse-based IDE has a full GUI that lets you pick your configuration options and add simple Verilog-like TIE for further customization.
|Xtensa Software Developer's Toolkit Product Brief
If you need to develop application code for an Tensilica DPU,the Xtensa Software Developer’s Toolkit provides a comprehensive collection of code generation and analysis tools that speed the development process.
|Title||File Size||Last Modified|
|Optimizing a DSP Architecture For Wireless Baseband
The high computation demands of next-generation cellular and broadcast wireless require both higher efficiency and greater flexibility in baseband processing. New DSP architectures are needed for applications with heavy workloads with complex filtering, FFT, and MIMO matrix operations.
|Cut DSP Development Time
The magic is in the compiler technology. Learn how an advanced compiler can help you get equivalent or better performance using standard C than other DSPs programmed in assembly code.
|Microprocessor Report Reviews ConnX BBE64
See what the insider's guide to microprocessor hardware has to say about BBE64.
|Tensilica Xtensa LX Processor with Vectra LX
This BDTI report evaluates the highest performance DSP core BDTI has tested.
|An Efficient, High-Performance DSP Architecture for WCDMA Receivers
This whitepaper begins with a comprehensive summary of the algorithms for WCDMA (Wideband Code Division Multiple Access) modem systems. This is followed by a detailed description of how the WCDMA algorithms can be implemented using Tensilica DSP cores and programmable accelerators for a WCDMA system. Finally, a use case is given on easily updating an existing LTE/LTE-Advanced modem system to become a multi-standard 3G/WCDMA system.