Floating-Point DSP Processors |
|||||||
| HOME << |
|||||||
Texas Instruments TMS320C67xThe TMS320C67x family is the floating-point version of Texas Instruments’ TMS320C62x family of fixed-point DSPs. Like the TMS320C62x, the TMS320C67x is based on a VLIW architecture which allows it to execute up to eight RISC-like instructions per clock cycle. It is capable of executing all TMS320C62x instructions and has added support for floating-point arithmetic and 64-bit data. The TMS320C67x family currently includes the TMS320C6701, the TMS320C6711, the TMS320C6712, and the TMS320C6713. The fastest TMS320C67x family member, the TMS320C6713, operates at 300 MHz and uses a 1.4-volt core supply. The TMS320C67x is upward compatible with the TMS320C62x; the TMS320C67x can execute TMS320C62x object code unmodified, but the TMS320C62x cannot execute all TMS320C67x instructions. The TMS320C67x is only partly compatible with the TMS320C64x, Texas Instruments’ next generation of the fixed-point TMS320C6xxx architecture, since the TMS320C64x extends the TMS320C62x instruction set with instructions that are not supported by the TMS320C67x. ArchitectureThe two TMS320C67x floating-point data paths provide a superset of the functionality of the fixed-point data paths of the TMS320C62x, adding support for IEEE-754 32-bit single-precision and 64-bit double-precision floating-point arithmetic. Each data path includes a set of four execution units, a general-purpose register file, and paths for moving data between memory and registers. The four execution units in each data path contain two ALUs, a multiplier, and an adder/subtractor which is used for address generation. The ALUs support both integer and floating-point operations, and the multipliers can perform both 16 × 16-bit and 32 × 32-bit integer multiplies and 32-bit and 64-bit floating-point multiplies. The two register files each contain sixteen 32-bit general-purpose registers. These registers can be used for storing addresses or data. To support 64-bit floating-point arithmetic, pairs of adjacent registers can be used to hold 64-bit data.The on-chip memory system of the TMS320C67x implements a modified Harvard architecture, providing separate address spaces for program and data memory. Program memory has a 32-bit address bus and a 256-bit data bus. Each of the two data paths is connected to data memory by a 32-bit address bus and two 32-bit data buses. Since there are two 32-bit data buses for each data path, the TMS320C67x can load two 64-bit or four 32-bit words per instruction cycle. Hence, the maximum sustainable on-chip data memory bandwidth for a 300 MHz TMS320C67x is 1200 million 32-bit words/second. The TMS320C6701 contains 64 Kbytes of program RAM and 64 Kbytes of data RAM. The program memory can be dynamically configured to act as a direct-mapped program cache. The other family members use a two-level cache organization like that of the fixed-point TMS320C6211. The TMS320C6711, TMS320C6712, and TMS320C6713 each contain two level-one (L1) caches, one for data and one for instructions. Each L1 cache contains 4 Kbytes of memory. The L1 program cache is direct mapped, and has a line size of 64 bytes (two instruction packets); the L1 data cache is two-way set-associative, and has a line size of 32 bytes. The L1 caches are fed by a unified level-two (L2) memory. The TMS320C6711, TMS320C6712, and TMS320C6713 each contain 64 Kbytes of L2 memory. This L2 memory can be configured as SRAM, as a set-associative cache, or as a partitioned combination of the two. In addition to this configurable memory, the TMS320C6713 L2 memory includes 192 Kbytes of SRAM that cannot be configured as cache. Addressing modes supported by the TMS320C67x include register-direct, register-indirect, indexed register-indirect, and modulo addressing. Immediate data is also supported. The TMS320C67x does not support modulo addressing for 64-bit data. The TMS320C67x does not support hardware looping, and hence all loops must be implemented in software. However, the parallel architecture of the processor allows the implementation of software loops with virtually no overhead. PeripheralsPeripherals on the TMS320C6701 include a four-channel DMA controller, two TDM-capable buffered serial ports, and two 32-bit timers. TMS320C6711, TMS320C6712, and TMS320C6713 peripherals include a 16-channel enhanced DMA controller, two TDM-capable buffered serial ports and two 32-bit timers. The TMS320C6713 also includes two audio serial ports and two I2C ports. Except for the TMS320C6712, all TMS320C67x family members also contain a host port.Power ConsumptionAccording to Texas Instruments, the TMS320C6713 consumes 694 mW at 200 MHz and 1.2 V. This measurement assumes about 80% CPU utilization. It includes power for the core, on-chip memory, PLL, and peripherals. The peripherals are clocked but inactive for this measurement.CostAs of the last quarter of 2004, pricing for quantity 10,000 purchases of the TMS320C67x ranged from $13.50 for the 150 MHz TMS320C6712C to $104.80 for the 167 MHz TMS320C6701.For Additional InformationThe TMS320C6713 achieves a BDTImark2000™ score of 1470 at 300 MHz. For more information and scores, click here. A complete analysis of this processor, including BDTI Benchmark™ results, is contained in BDTI’s report, Buyer’s Guide to DSP Processors, 2004 Edition.Last updated January 2005. |
|
|