Image/Video Processing


Built for Next-Generation Imaging/Video Requirements

Today's applications processors are not equipped to handle the complex image/video signal processing functions in mobile handsets, tablets, DTVs, automotive, video game and computer vision applications.

Cadence's IVP is a much-needed breakthrough product in terms of energy efficiency and performance in current products. It also enables applications never-before possible ina programmable device.

The IVP DSP was designed specifically to handle complex algorithms including innovative multi-frame image capture and video pre- and post-processing, video stabilization, HDR image and video processing, object and face recognition and tracking, low-light image enhancement, digital zoom and gesture recognition.

Did You Know?

The Apple iMac is just one of many computers that use AMD ATI Radeon graphics. Tensilica helps accelerate video stream decoding in the UVD-powered Radeon graphics chips.

Nothing Else Comes Close

It is now well accepted that even multi-core host CPUs just can’t handle these demanding applications. Even with four cores running at 1.50 GHz, the power required to run high-resolution video processing approaches 3W (and that’s without considering the OS and other functions that must run at the same time). There’s not enough energy efficiency to support today’s complex maging features, let alone tomorrow’s.

Some of these functions have been offloaded to hardwired accelerators. However, these fixed-hardware implementations are virtually impossible to program and are therefore restricted to a fixed set of functions. When new, enhanced algorithms are developed, these hardware blocks need to be redesigned. And the number of hardware blocks for all of the possible image and video enhancement algorithms just keeps growing.

Another option is to offload imaging algorithms to a GPU. However, GPUs typically offer floating point pipelines designed specifically for 3D graphics algorithms that are mostly not required or efficient in image and video processing applications. Also, most GPUs are somewhat difficult to program.

This opens up an opportunity for a programmable solution like the IVP from Cadence. It’s a much needed breakthrough in terms of energy efficiency and performance. It also enables applications never-before possible in a programmable device. Now imaging and video algorithms can run on a processor-based DSP that’s specifically optimized for the pixel computations required.

A Complete Platform for Image and Video Processing.

The IVP is a licensable, synthesizable subsystem with rich software tools and libraries. The instruction set, memory system and data types have all been optimized for high-throughput 8-, 16- and 32-bit pixel processing. 

IVP is much more than just a processor. It's a complete platform for image and video processing.

IVP Subsystem 

The IVP Platform

Details on the Platform

micro-DMA Transfer Engine


The μDMA engine is a closely coupled chaining DMA with interleaved 3D transfers that is closely coupled to the IVP core to reduce DMA programming and completion overhead. It has autonomous parallel operations with an independent 512-bit/cycle memory port into local data memory and a 128-bit/cycle port onto the AXI bus. It eliminates memory latency for loads and stores. It offers up to 10 GBytes/second of throughput to keep up with the rapid pace of resolution and frame rate requirements.

Direct RTL Interfaces


Optional port connections to legacy RTL blocks let designers stream data and control between the IVP core and the RTL blocks without having to go through memory.

Memory/Network Interface


The memory/network interface combines the IVP core and micro-DMA transactions with cluster traffic.

Highly Energy Efficient


The IVP is highly energy efficient compared to CPUs or GPUs for 16-bit pixel operations (e.g., absolute-difference, multiply-add, shift-saturate). As an example, for IVP implemented in an automatic synthesis, place-and-route flow in 28nm HPM process, regular Vt, a 32-bit integral image computation on 16b pixel data at 1080p30 consumes 10.8 mW. The integral image function is commonly used in applications such as face and object detection and gesture recognition.

High Performance


IVP’s high performance is demonstrated by complex kernels such as motion search and normalized cross-correlation, commonly used in high-precision block and feature matching and optical flow. For a smart motion search on 16-bit data over a 1920x1080 frame with 256x16 pixel search range and 9x3 pixel block size, IVP can achieve a rate of 142 sums of absolute differences per cycle. a normalized cross-correlation function on 16-bit pixel data with 32-bit accuracy achieves 1 million 8x8 blocks per second.


 

A 32-Element Engine, 4-Way VLIW, 16-bit Fixed Point Imaging/Video DSP

The IVP is a licensable, synthesizable subsystem with rich software tools and libraries. The instruction set, memory system and data types have all been optimized for high-throughput 8-, 16- and 32-bit pixel processing. It has an architecture that can scale by both the number of element engines as well as the number of processors.

 IVP core

The IVP Core Architecture
With Sample Memory Sizes Selected

 

Details on the Core Architecture

Summary


The IVP core is based on our proven Xtensa architecture. It uses a 4-way instruction issue with up to three pixel arithmetic operations per cycle (MUL, MAC, select, shift, ALU). It is capable of two 512-bit (32x16-bit) pixel data memory references per cycle.  It features a 32-way vector SIMD with 16-bit register elements. It employs 8– and 16-bit memory elements. And its instruction set is further extensible by the designer.

32-way Vector SIMD Dataset


Each SIMD slide contains a rich set of computations resources:

  • 3 independent 16-bit ALUs, 16x16 multiplier, 16-bit variable shifter
  • 3 register files per slide (pixel register file, predicate register file and shift select register file)
  • Interface to memory system that loads 8- or 16-bit data
  • Cross-element select and reduction network for arbitrary number of element swaps or reduction of operations per cycle (e.g. reduction min-max, reduction adds)
  • Memory rotator that operates at full rate on data from or to unaligned structures
  • Prediction fully supported by compiler for high utilization

4-way VLIW


The 4-way VLIW issue of vector operations gives an almost arbitrary mix of loads, stores, multiplies, and three ALU operations all taking place simultaneously across all 32 element engines.

Custom Instructions


The IVP features many imaging-specific operations to accelerate 8-, 16- and 32-pixel data types and video operation patterns.

A Highly Customizable Processor


Because the IVP is based on our proven Xtensa architecture, the core can be further optimized and configured using our automated Tensilica processor generator system. Please see the Xtensa section for all of the options available. The Xtensa Processor Generator creates a complete hardware design with matching software tools, including a mature, world-class auto-vectorizing compiler, a cycle-accurate SystemC-compatible instruction set simulator (ISS) and the full industry standard GNU toolchain.

Our Proven, Comprehensive HW and SW Design Environment

DPU design process

 

For Processor Designers

Cadence delivers patented, proven tools that automate the process of further customizing and delivering the IVP along with matching software tools. These tools have been proven in hundreds of designs. You get RTL, EDA scripts, and reference test bench and test cases. You also get an instruction set simulator, fast functional simulator, SystemC modeling tools, and pin-level cosimulation.

View the complete set of tools for processor designers.

Software development process

For Software Developers

Cadence provides a comprehensive Software Developer's Toolkit with code generation and analysis tools that speed the development process. Our Eclipse-based Xtensa Xplorer Integrated Development Environment (IDE) serves as the cockpit for the entire development experience. Our C/C++ Compiler is very highly rated, and has auto-vectorization to make compiling your code onto the IVP much easier. 

View the complete set of tools for software developers.

Port your software quickly in C - no assembly porogramming is required or recommended. Even our partners port and optimize their software in C. We also provide an image processing library to speed up your software design.

IVP Board

Application Demo Platform

Running imaging/video applications in real time requires the complete pipeline from sensor to processor to video output. This FPGA-based demo platform allows for integration of imaging applications in a real-time environment.

Documentation & Literature

Product Briefs

Title File Size Last Modified
IVP Imaging/Video DSP Product Brief
The IVP imaging/video DSP includes a unique instruction set tuned for multi-frame image capture and video pre- and post-processing algorithms, as well as video stabilization, HDR for image and video, object and face recognition and tracking, low-light enhancement, digital zoom and gesture recognition.
157 KB 02/11/2013

 


Upcoming Events

June 2-6, 2013
Design Automation Conference
Austin Convention Center, Austin, Texas USA, Cadence Booth 2214

View all events »

©2013 Tensilica Inc. All rights reserved.