Fixed-Point DSP Processors |
|||||||
| HOME << |
|||||||
Texas Instruments TMS320C64xThe TMS320C64x is a 16-bit fixed-point family of packaged DSP processors from Texas Instruments. Announced in February 2000, the TMS320C64x is an extension to Texas Instruments’ earlier TMS320C62x architecture. Its instruction set is a superset of that of the TMS320C62x and adds significant SIMD processing capabilities, among other enhancements. The TMS320C64x family targets high-performance applications such as wireless base stations, digital subscriber loops, multi-line modems, ISDN modems, imaging, 3D imaging applications, video applications, and radar and sonar systems. The fastest TMS320C64x family members execute at 1 GHz with a 1.2-volt core supply and 3.3-volt I/O. ArchitectureThe core architecture of the TMS320C64x family consists of two fixed-point data paths, a program control unit (including program fetch, instruction dispatch, and instruction decode units), and program and data memory interfaces.The TMS320C64x is a VLIW architecture with eight execution units, including two multipliers and four ALUs. Using its eight execution units, the processor can execute up to eight 32-bit instructions in a single clock cycle, allowing it to achieve a high level of parallelism. The TMS320C64x is able to perform four 16-bit multiplications in parallel. All execution units in the TMS320C64x have a throughput of one cycle and latencies from one to several cycles depending on the instruction. The TMS320C64x has two register files containing a total of sixty-four 32-bit general-purpose registers, twice as many as the TMS320C62x. Each of the two register files has eleven 32-bit read ports and eight 32-bit write ports, allowing multiple concurrent accesses from a group of parallel instructions to various registers of each register file. As on the TMS320C62x, two simultaneous memory accesses on the TMS320C64x cannot have source or destination operands in the same register file. The TMS320C64x supports 32-bit signed or unsigned addition or subtraction with optional saturation. It also supports dual 16-bit and quad 8-bit SIMD additions and comparisons. Supported logical operations include and, or, exclusive-or, and not instructions. Multiply instructions include mixed signed/unsigned 16-bit multiplications with 32-bit results, dual 16-bit SIMD multiplications, and quad 8-bit SIMD multiplications (producing four packed 16-bit results in a 64-bit register pair). In addition to support for SIMD-style multiplications, the TMS320C64x also has support for dual and quad SIMD dot products. The results of dual 16-bit × 16-bit or quad 8-bit × 8-bit SIMD multiplications are summed together and placed in a 32-bit register. The TMS320C64x supports dual 16-bit arithmetic shift right operations, in which the signed 16-bit lower and upper parts of a 32-bit register are individually shifted right. The TMS320C64x also has support for packed-data manipulation. The TMS320C64x includes some special-purpose instructions, including absolute value, exponent detection, normalization, a conditionally executed subtract and shift to support division, and other application-specific instructions. All load and store instructions on the TMS320C64x allow the address pointer to be pre- or post-modified as part of the load instruction. The modified address is updated with single-cycle latency (in contrast to the actual load itself, which has a latency of five cycles). Non-aligned loads and stores are supported on the TMS320C64x through specific instructions. The TMS320C64x uses an 11-stage non-interlocked pipeline. Instruction latencies vary from two to five cycles. Multiplies, for example, have two-cycle latencies, while data loads have five-cycle latencies. Both have single-cycle throughput, however. All branches are delayed branches, and introduce five delay slots. The on-chip memory system of the TMS320C64x implements a modified Harvard architecture providing separate address spaces for program and data memory. Program memory has a 32-bit address bus and 256-bit data bus; data memory has two 32-bit address buses and two 64-bit data buses. The TMS320C64x uses a two-level on-chip memory organization, with L1 program and data caches and a unified (program and data) L2 cache. The L2 cache can also be configured as RAM. The TMS320C64x supports modulo addressing and can maintain eight concurrently active circular buffers. However, only two unique buffer sizes are allowed simultaneously, and the buffer sizes must be powers of two. PeripheralsAll TMS320C64x family members include a 64-channel enhanced DMA controller, two or three multi-channel buffered serial ports, and three 32-bit timers. Depending on the family member, TMS320C64x chips may also include host ports, PCI interfaces, serial audio ports, I2C ports, video ports, Ethernet MACs, Viterbi decoder coprocessors, turbo decoder coprocessors, and UTOPIA ports.Power ConsumptionPower consumption for TMS320C64x chips varies by family member. The TMS320C6414T consumes 673 mW at 600 MHz and 1.1 volts. This measurement is based on a typical DSP workload. It includes power for the PLL and on-chip peripherals. (The on-chip peripherals cannot be disabled, but they are inactive for this measurement.)
CostPricing for TMS320C64x chips in 10,000-unit quantities ranges from about $18 (for the TMS320C6410 at 400 MHz) to about $220 (for the TMS320C6416T at 1 GHz).For Additional InformationThe TMS320C64x achieves a BDTImark2000™ score of 9130 at 1000 MHz. For more information and scores, click here. A complete analysis of this processor, including BDTI Benchmark™ results, is contained in BDTI’s report, Buyer’s Guide to DSP Processors, 2004 Edition.Last updated January 2005. |
|
|