TI Supports Floating-Point in New High-Performance DSPs

Submitted by lapsley on Thu, 11/18/2010 - 20:00

In early 2010, Texas Instruments (TI) announced a new multi-core DSP SoC architecture. This month, TI announced the first chips based on this architecture. This latest announcement includes details of TI’s new TMS320C66x (C66x) DSP processor core, which offers both state-of-the-art fixed-point performance and strong floating-point support. The multi-core architecture and C66x core underlie a family of new general-purpose DSPs, as well as two chips for wireless infrastructure applications, one of which specifically targets 4G LTE networks.  Based on BDTI benchmarks, the new core architecture will afford designers the processing power needed for demanding applications in areas such as communications, radar, and medical imaging, along with floating-point support that can simplify the development process.

The chip announcements start with a family of three pin-compatible general-purpose DSPs: the TMS320C6672, TMS320C6674, and TMS320C6678, with 2, 4, and 8 cores respectively. Available clock speeds range from 1.0 to 1.25 GHz.

The company also announced two special-purpose chips: the TMS320C6670 communications processor (1.0 to 1.2 GHz clock speed) and the TMS320TCI6616 wireless base station SoC (1.0 to 1.2 GHz). Both chips integrate four C66x cores along with a network coprocessor that handles packet processing, and physical-layer accelerators for tasks such as FFTs, Viterbi decoding, and turbo encoding and decoding.

The TMS320TCI6616 (see Figure 1) is intended specifically for 4G base stations, while the TMS320C6670 is intended for other software-defined radio applications. Accordingly, the TMS320TCI6616 incorporates additional base-station-specific accelerators for 3G chip-rate and 4G bit-rate processing. TI claims that the TMS320TCI6616 can implement a single-sector LTE baseband with no additional FPGAs or ASICs required.


Figure 1.  Block diagram of the new Texas Instruments TMS320TCI6616 , which targets LTE base stations.

In all five of the new chips, the DSP cores are connected via a switched interconnect fabric that also links the network coprocessor, physical-layer accelerators, and other hardware resources.  According to TI, the switched interconnect supports an aggregate bandwidth of 2 Tbits/second implemented via high-speed non-blocking channels that transfer packet data. A packet manager controls 8,192 queues and directs tasks to appropriate available hardware resources including cores, coprocessors, and accelerators.

Perhaps the most interesting aspect of the announcement is the fact that TI has—for the first time—incorporated floating-point support into its highest performance DSP core. Historically, DSP system designers have faced a tough choice in deciding between fixed-point or floating-point processors: While floating-point processors are significantly easier to use, fixed-point devices have offered better performance, energy efficiency, and cost-performance. With the new C66x core, TI believes it can deliver the best of both worlds.

BDTI has noted a growing trend toward increased use of floating-point processors in DSP applications. As Jeff Bier wrote in his Impulse Response column in the last issue of InsideDSP: “It sounds simple, but for applications that use sophisticated algorithms, migrating from floating-point to fixed-point math is a complex undertaking.  In some cases, the effort required to perform this conversion can equal the effort required to develop the algorithm in the first place.  And the migration process requires a specialized set of skills that appears to be becoming rarer.  Many design teams lack the time or skills required for this work.  As applications and algorithms become more complex, while design teams remain roughly the same size, or even shrink, the incentive to avoid dealing with fixed-point math will increase.”

BDTI was able to get access to an early sample of one of the new C66x chips in order to confirm previously simulated results on the BDTI DSP Kernel Benchmarks™.  (Full benchmark results are available on BDTI’s web site.) These results provide an independent evaluation of the performance the new DSP core will deliver. The 1.25 GHz C66x core delivered a fixed-point BDTImark2000 score of 16,690—well above the 13,170 score of TI’s previous-generation C64x+ core and also above the previously-best 15,420 score achieved by the Freescale SC3850 core used in the MSC815x/825x chip families. [Editor’s note:  As this article was going to press, Freescale announced a faster version of the SC3850 core that boosts clock speed from 1.0 to 1.2 GHz.  BDTI has issued a BDTIsimMark2000 of 18,500 for the 1.2 GHz Freescale core, which Freescale says will begin sampling to customers next month.]

On floating-point performance the C66x delivered a BDTImark2000 score of 10,720—far above the performance of previous-generation floating-point DSPs.  So, while the C66x performs significantly better on fixed-point than on floating-point DSP tasks, its floating-point performance is quite strong.  This will enable application developers to develop initial application implementations using floating-point math, and then decide whether performance-intensive sections of the code should be migrated to fixed-point to boost performance. Having both on the same chip should prove to be a real advantage, as TI is the only supplier offering high-performance, multi-core DSP chips that support both fixed- and floating-point.

Of course, design teams will still have to evaluate whether the cost and power profiles of the new TI chips match their applications. TI is quoting a starting price of $99 in 1,000-unit volumes for the TMS320C6672 dual-core, general-purpose DSP, which is the lowest-performance chip in the new line-up. TI has yet to detail power consumption specs but has said that an 8-core device operating at 1.25 GHz will dissipate around 10W.

Add new comment

Log in to post comments