Fixed-Point DSP Processors
BDTI
HOME << FREE INFO << PROCESSOR OVERVIEWS << BDTI

Texas Instruments TMS320C8x

Texas Instruments TMS320C8x family members combine multiple fixed-point DSP processors and a RISC controller on the same chip. Each of the on-chip processors is capable of 60 MIPS at 3.3 volts. TMS320C8x devices are targeted at high-performance applications such as video and image processing and cellular telephone base stations. The first member of the TMS320C8x family, the TMS320C80, was introduced in 1995, and features four fixed-point DSPs. The other member of the family, the TMS320C82, is a lower-cost version of the TMS320C80. It includes the same RISC controller as the TMS320C80 but has only two on-chip fixed-point DSPs.

The TMS320C8x controller processor, called the "Master Processor," is a 32-bit general-purpose processor. The Master Processor includes an IEEE-754 compliant floating-point unit. Although it uses a RISC-like architecture, the Master Processor also has some DSP-specific features. For example, a floating-point multiply-accumulate (MAC) instruction is provided. Most Master Processor floating-point operations have single-cycle throughput.

The fixed-point DSP processors on the TMS320C8x are called "Parallel Processors." The Parallel Processors use 64-bit instruction words and SIMD operations in the data path, which can be configured to support 8-bit, 16-bit, and, to some extent, 32-bit data. The multiple processors on the TMS320C8x have individual instruction caches and execute independently of each other. The processors communicate via shared memory and by sending each other interrupt requests.

The Parallel Processor data path consists of a multiplier, a 32-bit, three-input ALU, and eight 32-bit general-purpose registers. Additionally, a barrel rotator (an execution unit that is similar to a barrel shifter but performs rotation by an arbitrary number of bits in one instruction cycle, as opposed to shifting by an arbitrary number of bits), a mask generator, and a ``multiple-flags'' (MF) register are provided for bit-manipulation and shifting operations. The multiplier performs one 16x16->32-bit multiplication or two 8x8->16-bit multiplications in a single instruction cycle. The multiplier supports signed/signed or unsigned/unsigned (but not signed/unsigned) multiplies when 16-bit input operands are used, and signed/unsigned or unsigned/unsigned (but not signed/signed) multiplies when 8-bit input operands are used.

The Parallel Processor three-input ALU is capable of executing all 256 combinations of logical operations on three variables. This allows operations that may take several instructions on most DSPs to be performed in a single ALU operation.

The Parallel Processor supports rounding by allowing the 32-bit result of a 16-bit by 16-bit multiplication to be shifted left by up to three bits and then rounded to 16 bits. Saturation support is not provided.

The TMS320C8x has a single four Gbyte, byte-addressable memory space. The same memory space is used for program and data accesses, and is shared by all of the processors on the chip.

The TMS320C80 contains 50 Kbytes of on-chip RAM, divided into 25 blocks of 2 Kbytes each. Each of the five processors on the TMS320C80 has five such blocks associated with it. The TMS320C82 contains 44 Kbytes of on-chip RAM, divided into 11 blocks of 4 Kbytes each. Each of the 25 on-chip RAM blocks on the TMS320C80 or 11 on-chip RAM blocks on the TMS320C82 is capable of one 64-bit-wide access per instruction cycle. A crossbar network of switches allows a different processor to access each RAM during each instruction cycle. Each processor can access multiple RAMs in a single instruction cycle, although only one processor can access any individual RAM in one instruction cycle. The Master Processor is capable of up to two accesses per instruction cycle: one 32-bit instruction fetch, and one 64-bit data read or write. Each Parallel Processor is capable of up to three accesses per instruction cycle: one 64-bit instruction fetch, one local 32-bit data read or write, and one global 32-bit data read or write. Thus, each Parallel Processor has an on-chip memory bandwidth of 960 Mbytes/second at a clock rate of 60 MHz.

The TMS320C80 provides one external 32-bit address bus and one external 64-bit data bus. On the TMS320C82, addresses and data are multiplexed over the same external 64-bit bus. Additionally, the TMS320C82 provides a 16-bit bus over which it multiplexes the "row" and "column" portions of the memory address for interfacing to dynamic RAM. Peak off-chip memory bandwidth of TMS320C8x devices is 480 Mbytes/second at a clock rate of 60 MHz.

Each Parallel Processor has two identical address generation units. The TMS320C8x supports register-direct, memory-direct, register-indirect, and indexed register-indirect addressing, and immediate data.

The Master Processor does not support hardware looping. Each Parallel Processor supports up to three nested multi-instruction hardware loops.

The TMS320C8x provides several on-chip peripherals, including a "Transfer Controller" which loads instructions and data into the various processors' caches when cache misses occur, and performs other memory transfer operations. Additionally, the Master Processor contains a 32-bit timer. The TMS320C80 also includes a Video Controller (VC) peripheral.

As of March 1997, the 60 MIPS TMS320C82 in a 352-pin BGA package sold for $150.00 in quantity 1,000. For a complete evaluation of this processor, including BDTI Benchmark™ results, contact BDTI.

Top of page