- AMD's ROCm: CUDA Gets Some Competition
- CEVA-X1 DSP Core Targets Cellular IoT Opportunities
- Case Study: Careful Analysis Leads to Successful Products
- Vayyar Aspires to Deliver Compact, Cost-Effective, Versatile Depth Scanning
- The CEVA-XM6 Vision Processor Core Boosts Performance for Embedded Deep Learning Applications
CEVA Hits 1 GHz With Latest TeakLite DSP Core
CEVA has added the CEVA-TL3211 core to its TeakLite family of licensable DSP cores targeting applications ranging from handset baseband processing to audio processing in home-network, multimedia gateway, and living-room multimedia products. According to CEVA, the new core will reach a clock speed of 1 GHz in a 40 nm implementation and includes a new fully-cached memory design. The new core bumps up the performance of the broadly licensed TeakLite family while offering binary compatibility with the CEVA-TL3210 and other prior family members. A block diagram of the 32-bit core is depicted in Figure 1.
CEVA’s TeakLite family has found broadest use in baseband processing for 2G/3G handset and low-cost smartphone applications. CEVA believes the new core will also find use in those applications while providing more headroom to tackle multimedia tasks. But CEVA has also sought to optimize the TL3211 core for applications such as audio processing in high-end Blu-ray players and network-enabled set-top boxes.
Figure 1: The CEVA-TL3211 DSP core uses a fully-cached architecture and includes an FFT accelerator and 32x32-bit MAC targeted at audio applications.
The TL3211 core is designed for implementation in 40 nm processes, but CEVA expects that it will also be implemented in other process nodes, such as 65 nm low-power. CEVA points out that power consumption is increasingly important in consumer electronics, in part due to government initiatives such as Energy Star. CEVA also believes some customers will choose the 65 nm low-power process because lower heat dissipation will enable lower-cost chip packaging options.
The new core relies on essentially the same microarchitecture and pipeline that CEVA used in the prior-generation core, the TL3210. But the new design supports a faster clock rate, includes a more robust memory system, adds power-management features, includes a new system bus, and offers new features that accelerate audio processing. CEVA believes the new core will simplify the design of multimedia-centric consumer devices, increase audio fidelity and feature sets in such devices, and deliver cost savings.
A faster clock speed is a key part of the story in consumer devices because devices such as top-end Blu-ray players require simultaneous processing of multiple audio streams to support advanced features such as director’s commentary. According to CEVA, the TL3211 will achieve a clock speed of 1 GHz in a TSMC 40 nm G (general purpose) process with worst-case conditions. This represents roughly a 10% speed boost over the prior-generation TL3210 core. According to CEVA, the speed boost is primarily due to improvements in critical memory paths. According to CEVA, the core will max out at 480 MHz when realized in a 65 nm LP process, and at 850 MHz in a 65 nm G process.
The core has a number of accelerator blocks intended for use in both baseband modems and audio applications, as well as an enhanced multiply-accumulate (MAC) unit for better audio fidelity. The MAC implementation uses a 32×32-bit multiplier with a 72-bit accumulator for improved dynamic range. The double-precision FFT accelerator is intended to speed implementations of high-end audio codecs, and the Viterbi accelerator is aimed at wireless baseband applications.
The TL3211 features caches for both program and data memory whereas the earlier TL3210 only had a cache for program memory. The two-way, set-associative caches can use copy-back or write-through policies. Chip designers using the TL3210 can configure the cache sizes. According to CEVA, the fully-cached memory architecture is intended to reduce system costs by enabling the use of low-cost external DDR memory.
The 3211 is the first TeakLite core to integrate the power-scaling unit (PSU) that CEVA developed for the CEVA-X and CEVA-XC cores. According to CEVA, the PSA reduces both static and dynamic power consumption.
The TL3211 incorporates 64/128-bit AXI (Advanced Extensible Interface) system buses including master and slave buses as illustrated in Figure 1. AXI is the latest SoC interconnect scheme in the AMBA (Advanced Microcontroller Bus Architecture) family driven by ARM. AXI is becoming widely used to connect a core with third-party IP blocks in SoC designs; it’s incorporation into the TL3211 will therefore simplify the SoC integration process according to CEVA.
Having discussed key features and manufacturing characteristics, let’s have a look at what the TL3211 delivers in terms of performance, die size, and power consumption.
CEVA detailed a Blu-ray player use case based on DTS-HD audio to illustrate TL3211 performance in an application. The application requires a primary audio decode of a 24.5 Mbps stream to deliver 5.1-channel, 192 kHz audio. The application also requires a secondary audio decode of a 256 kbps stream for interactive audio features such as commentary. The audio system must also encode a 48 kHz signal, and handle up-sample, down-sample, and mixing operations to support interactive applications.
According to CEVA, the TL3211 can handle this Blu-ray use case at a clock speed of 340 MHz. Based on CEVA’s clock speed projections for the TL3211, this use case will fit comfortably in a 65 nm LP implementation of the TL3211, with headroom left over for other tasks. CEVA judges the Tensilica HiFi EP core as the closest competitive core in such applications. Tensilica published a similar Blu-ray use case on its web site, indicating that a HiFi EP core requires a clock speed of 386 MHz to handle the audio tasks. According to the Tensilica site, the HiFi EP core tops out at 312 MHz in a 65nm LP process, so it would not be able to implement this use case in the 65 nm LP process.
According to CEVA, the TL3211 core measures 0.2 mm2 in a 40 nm process with no memory. An implementation with 24 kwords of data memory and 8 kwords of program memory—the configuration for the Blu-ray use case—measures 0.6 mm2, according to CEVA.
Power consumption will depend on many factors such as the process used, memory configuration, and clock speed. CEVA provided two power projections for audio-centric tasks: According to CEVA, a Dolby Digital Plus decoder would require 80 µW/MHz, and a MP3 decoder would require 50 µW/MHz.
It’s almost impossible to judge the real power characteristics of the TL3211 core without more data. The performance claims, conversely, seem fairly solid given that the design is an evolutionary improvement over to the TL3210 core and the improvements are relatively straightforward. CEVA directly provides more than 90 audio and voice codecs for the TL3210 and TL3211 cores. CEVA’s continuing investment in both the TeakLite cores and in optimized audio software components indicates that CEVA is serious about competing in the market for audio-oriented DSP cores.