Buyer's Guide to DSP Processors |
||
| HOME << PRODUCTS << | ||
|
Copyright © 1997 Berkeley Design Technology, Inc. The following is a six-page excerpt from the thiry-one-page Texas Instruments TMS320C62xx analysis from the third (1997) edition of Buyer's Guide to DSP Processors. 7.17 Texas Instruments TMS320C62xx Family
IntroductionThe TMS320C62xx is the latest family of fixed-point DSP processors from Texas Instruments. The TMS320C62xx is based on a completely new architecture compared to previous DSP processor families from Texas Instruments. The processor contains eight execution units that include two multipliers and four ALUs. Using these eight execution units, the processor can execute up to eight 32-bit RISC-like instructions in a single clock cycle, enabling it to achieve a high level of parallelism. Instructions operate on 16-, 32-, or 40-bit data. The TMS320C62xx family is targeted at high-performance applications, such as wireless base stations, digital subscriber loops, multi-line modems, and ISDN modems. Because the TMS320C62xx can execute up to eight instructions per clock cycle, the term ``instruction cycle'' is potentially ambiguous when discussing this processor. As used here, ``instruction cycle'' means the time required to execute a single group of one to eight parallel instructions. On the TMS320C62xx, one instruction cycle is equal in length to one master clock cycle. Additionally, since TMS320C62xx instructions often perform fewer operations than typical instructions on other DSPs, a MIPS comparison between the TMS320C62xx and other DSPs is not meaningful. Therefore, instead of MIPS, we use the number of MACs per second as a shorthand performance metric in this analysis. The first member of the TMS320C62xx family, the TMS320C6201, was announced in February 1997. Currently, only advance release samples of the TMS320C6201 are available. These samples incorporate a limited set of peripherals, use a 2.5-volt core supply (with 3.3-volt I/O), and execute up to 240 million MACs per second when operating at 120 MHz. According to Texas Instruments, the full-speed production version, the 200 MHz TMS320C6201, will be available in late 1997 and will include a wider array of on-chip peripherals. (This part is referred to as a ``1,600 MIPS'' processor by Texas Instruments, since it is projected to execute a maximum of eight RISC-like instructions per clock cycle at 200 MHz. It will be capable of executing 400 million MACs per second when running at 200 MHz.) The analysis presented here is based on the advance release version of the TMS320C6201 except where noted. Table 7.17-1 shows the characteristics of the advance release version of the TMS320C6201.
* The core operates at 2.5 volts while the peripherals are 3.3-volt compatible.
By using an architectural approach similar to those of VLIW (very long
instruction word) processors, the TMS320C62xx achieves a high level of
parallelism with a simple architecture. This is done by avoiding the
need for complex instruction scheduling and dispatch hardware in the
processor. Instead, the burden of instruction scheduling is shifted to
the code generation tools or the assembly language programmer. This
results in a simpler and faster processor architecture compared to
processors with dynamic instruction scheduling. VLIW architectures
typically suffer from several disadvantages, such as high program
memory usage and complexity in designing efficient compilers. The
TMS320C62xx architecture includes several features designed to reduce
program memory requirements and alleviate other disadvantages
typically associated with VLIW architectures. These features include
instruction packing, conditional execution for all instructions, and
variable-length instructions, all of which are discussed
below. Despite these features, the TMS320C62xx consumes more program
memory than other fixed-point DSPs, as detailed in our discussion of
benchmark results, below.
The core architecture of the TMS320C62xx family consists of two fixed-point data paths, a program control unit (including program fetch, instruction dispatch, and instruction decode units), and program and data memory interfaces. Figure 7.17-1 illustrates the TMS320C62xx family architecture as typified by the TMS320C6201. ![]() FIGURE 7.17-1. TMS320C6201 processor architecture. Dashed blocks indicate peripherals that are to be added in the production release of the part, according to Texas Instruments. The TMS320C62xx has two nearly identical data paths. As illustrated in Figure 7.17-2, each data path has a set of four execution units, a general-purpose register file, and paths for moving data between memory and the data path. The execution units in each data path consist of L, S, M, and D units. Typically each unit operates on 32-bit operands, but the L and S units can also operate on 40-bit (``long'') operands. As described below, each execution unit is capable of performing a dedicated set of operations. ![]() FIGURE 7.17-2. TMS320C62xx data paths. Each data path includes four execution units (L, S, M, and D), described in the text. The arrow between the data paths denotes the cross-paths that allow each data path to access the register file of the other data path.
In the best case, all units operate in parallel, and the processor performs four arithmetic operations, two multiplications, and two address calculations in one instruction cycle.
The TMS320C62xx provides two register files, A and B, each containing 16 32-bit general-purpose registers. These registers can be used for storing addresses or data. The registers are labeled A0-A15 for data path one and B0-B15 for data path two. To support 40-bit arithmetic, pairs of adjacent registers can be used to hold 40-bit (``long'') data. In this case the 32 LSBs are stored in an even-numbered register and the 8 MSBs are stored in the 8 LSBs of the next (odd-numbered) register. The remaining bits of the odd-numbered register are zero filled. The TMS320C62xx implements a load-store architecture: operands must be loaded into the registers before they can be used by the execution units. Generally, the execution units of data path one operate on registers in register file A and the units of data path two operate on registers in register file B. However, the register files are interconnected to the opposite data path's functional units via cross paths. This allows both data paths to fetch one 32-bit operand per instruction cycle from the register file of the other data path. In each data path, each execution unit has its own read and write ports to its register file. Thus, all the execution units in each data path can access the local register file simultaneously. This means that in an ideal situation all execution units in both data paths operate independently and eight simultaneous operations can be performed. However, some restrictions apply, the most significant of which are:
Overflow protection is supported on the TMS320C62xx via saturation logic and 40-bit arithmetic. Saturation is supported by the L units via special instructions, such as add and subtract with saturation (SADD and SSUB). These instructions perform the indicated arithmetic operation and, in case of overflow, saturate the result to the largest or smallest value that can be represented using 2's complement arithmetic. The saturated result is either a 32- or 40-bit value, depending on the width of the destination register. The L, S, or M unit automatically sets the saturation bit in the control status register when saturation occurs; this bit can only be cleared via an explicit instruction. The L and S units can operate on 40-bit operands, which corresponds to having a 32-bit register with eight guard bits. A dedicated saturate instruction can be used to convert a 40-bit value to 32 bits and saturate the result. Besides the bit that indicates the occurrence of saturation, no other status bits (carry, negative, etc.) are provided by the TMS320C62xx data paths. The TMS320C62xx does not provide hardware rounding. Copyright © 1997 Berkeley Design Technology, Inc. The above is a six-page excerpt from the thirty one-page Texas Instruments TMS320C62xx analysis from the third (1997) edition of Buyer's Guide to DSP Processors. |
|
|