Fixed-Point DSP Processors
BDTI
HOME << FREE INFO << PROCESSOR OVERVIEWS << BDTI

Texas Instruments TMS320C55x

The TMS320C55x is a 16-bit fixed-point packaged DSP processor family from Texas Instruments announced in February 2000. It can execute up to two instructions in parallel, with instruction widths varying from 8 to 48 bits depending on the number of operands and parallel operations. The TMS320C55x is based on the earlier TMS320C54x family but adds significant enhancements to the older processor architecture and instruction set. The TMS320C55x is partially assembly code compatible with the TMS320C54x.

The TMS320C55x is intended for use in applications that require a combination of strong DSP performance and high energy efficiency. Target applications for the TMS320C55x include cellular telephones and modems and telecom infrastructure applications such as voice-over-IP gateways and multi-channel modem banks. The TMS320C55x interfaces directly to SDRAM making it well suited for use in portable consumer products where large memory buffers are required, e.g., digital cameras and CD-ROM-based portable digital audio players. The fastest family members execute at 300 MHz at 1.26 volts.

Architecture

The TMS320C55x architecture consists of a 16-bit fixed-point data path, an address unit, a program flow control unit, and an instruction buffer unit. The processor provides four 40-bit accumulators, four 16-bit data registers, and eight 16-bit auxiliary registers primarily used for addressing.

The TMS320C55x data path contains four main execution units: a 40-bit ALU, a 40-bit barrel shifter, and two 17 × 17-bit MAC units. (In contrast, the TMS320C54x has only one MAC unit.) The multipliers are not fully independent because they share one input. Like the ALU in the TMS320C54x, the TMS320C55x 40-bit ALU supports SIMD dual additions and subtractions by treating 32-bit memory operands as packed 16-bit data. The 40-bit ALU also supports dual 16-bit maximum and minimum operations. The TMS320C55x address unit contains a 16-bit ALU that is independent from the processor’s three address generators and can be used in parallel with the main data path.

The TMS320C55x uses a fully interlocked seven-stage pipeline. Most TMS320C55x instructions have single-cycle throughput. Many program control instructions force pipeline flushes, and thus require four to seven cycles to execute. Program control instructions include all branches and subroutine calls and return instructions (i.e., subroutine and interrupt returns).

The TMS320C55x memory system implements a modified Harvard architecture with separate program and data memory spaces. The TMS320C55x fetches instructions using a 24-bit program memory address bus and a 32-bit program memory data bus. The TMS320C55x includes five unidirectional data memory bus sets: three data read bus sets and two data write bus sets. Each bus set includes a 24-bit address bus and a 16-bit data bus. The write buses support storage of two 16-bit words from separate accumulators or one 32-bit word from one accumulator. Maximum on-chip data read bandwidth is 900 million 16-bit reads per second at 300 MHz.

The TMS320C55x contains a 128-entry instruction buffer that fetches 32 bits of instruction data per cycle, partially decodes the instruction data, and dispatches up to two instructions per cycle. Because the maximum issue width (48 bits) is greater than the fetch width (32 bits), the instruction buffer may run out of instruction data, causing a stall.

The TMS320C55x includes three address generation units and can generate three addresses per cycle. The processor supports register-direct and register-indirect addressing of 8-, 16-, and 32-bit data. Other addressing modes include bit-reversed and modulo addressing. Immediate data is also supported.

The TMS320C55x supports three levels of nested hardware loops. Additional levels of loop nesting can be implemented using conditional branches or by storing and restoring hardware loop control registers. Both single- and multi-instruction hardware loops are supported. All loops are interruptible.

Peripherals

All current TMS320C55x family members include the following peripherals: a host port interface, three serial ports (two on the TMS320VC5501), two or three timers, and a six-channel DMA controller. Most also include a watchdog timer, and some include a real-time clock. I2C interfaces and UARTs are available on some parts. The TMS320VC5507 and TMS320VC5509A include a USB 2.0 port, and the TMS320VC5509A includes a MultiMedia Card/Secure Digital serial port.

Power Consumption

Power consumption for TMS320C55x chips varies by family member. The TMS320VC5509A consumes 62.2 mW at 108 MHz and 1.2 volts. This measurement is based on a typical DSP workload, and includes power for the PLL; peripherals are disabled. (The peripheral clock can be enabled or disabled for individual peripherals.)

Cost

Pricing for TMS320C55x chips in 10,000-unit quantities ranges from about $5 (for the TMS320VC5501 at 300 MHz) to $19 (for the TMS320VC5510A at 200 MHz).

For Additional Information

The TMS320C55x achieves a BDTImark2000™ score of 1460 at 300 MHz. For more information and scores, click here. A complete analysis of this processor, including BDTI Benchmark™ results, is contained in BDTI’s report, Buyer’s Guide to DSP Processors, 2004 Edition.

Last updated January 2005.

Top of page