DSP on General-Purpose Processors |
||
| HOME << PRODUCTS << | ||
|
Copyright © 1997 Berkeley Design Technology, Inc. 6.6 Motorola/IBM PowerPC 604/604eIntroductionThe PowerPC 604 and PowerPC 604e are four-issue superscalar RISC processors from IBM Microelectronics and Motorola. The processors are targeted at general-purpose desktop computing and have found design wins in the Apple Macintosh line of personal computers and in Macintosh clones. The fastest version of the PowerPC 604 operates at a clock speed of 180 MHz with a 3.3-volt supply. The PowerPC 604e is an enhanced version of the PowerPC 604 and can operate at a clock speed of 225 MHz with a 2.5-volt supply for the processor core and a 3.3-volt supply for I/O. PowerPC 604 processors are manufactured and sold by Motorola and IBM Microelectronics and are being licensed to other vendors. The PowerPC 604 and PowerPC 604e are implementations of the PowerPC architecture specification, jointly developed by Apple, IBM, and Motorola. The PowerPC architecture specification is based on the POWER architecture, defined by IBM in the late 1980s. POWER was the first RISC architecture designed specifically for superscalar implementation. The PowerPC architecture specification has seen a number of implementations from both IBM Microelectronics and Motorola. These different implementations target different application areas, including desktop computing, automotive and industrial control, communications, and other embedded systems.
Different implementations of the PowerPC architecture specification have essentially identical programming models but can vary significantly in implementation and performance. The analysis below applies only to the PowerPC 604 and 604e and does not reflect the performance of other PowerPC variants such as the PowerPC 603. In this report, the term PowerPC 604 refers to both the 604 and 604e unless otherwise noted.
The PowerPC 604 uses a superscalar RISC architecture and can dispatch and complete up to four instructions in a single clock cycle. The processor operates on 32-bit instructions and integer data and on 64-bit double-precision or 32-bit single-precision floating-point data. The PowerPC 604 architecture consists of a program control unit, two simple integer ALUs, one complex integer unit, a floating-point unit, a load/store unit, a branch unit, and instruction and data caches. The PowerPC 604e includes one additional functional unit called the CRU. This unit performs logical operations on the condition register. Figure 6.6-1 illustrates the PowerPC 604 architecture.
Figure 6.6-1. Simplified PowerPC 604 architecture. Speculative and out-of-order execution improve the utilization of the PowerPC 604's various functional units. Instruction reordering is facilitated by register renaming. The PowerPC 604 has independent floating-point and integer data paths.
The floating-point data path consists of a fully IEEE-754 compliant floating-point unit and thirty-two 64-bit floating-point registers. The floating-point unit is capable of operating on either 64-bit (double-precision) or 32-bit (single-precision) operands with no difference in speed, except for division operations which take longer for 64-bit operands. Since the PowerPC 604 uses a strict load/store architecture, all floating-point input operands come from the floating-point register set, and all floating-point results are stored back to floating-point registers. The floating-point unit is pipelined and performs all operations with a latency of three clock cycles and a throughput of one clock cycle. Division operations are an exception, and take 18 clock cycles for a single-precision division and 31 clock cycles for a double-precision division. Division operations stall the floating-point unit's pipeline until the division is complete. Certain conditions such as overflow, underflow, and other conditions related to rounding and normalization of floating-point results may cause the floating-point unit to stall for one clock cycle. Additionally, when storing a single-precision floating-point number to memory, a penalty of up to 23 clock cycles may be incurred if the number is non-zero but small enough that it needs to be denormalized to fit in a single-precision representation. The PowerPC 604 provides a non-IEEE mode in which the processor avoids some of these data-dependent penalties but does not fully comply with the IEEE-754 standard for floating-point arithmetic. In non-IEEE mode denormalized numbers are simply truncated to zero.
The PowerPC 604's floating-point unit supports multiply-add and multiply-subtract operations.
The integer data path consists of two simple integer ALUs, a complex integer unit, and a set of thirty-two 32-bit general-purpose registers. The simple integer ALUs perform simple arithmetic operations such as addition and subtraction, as well as logic operations. Each simple integer ALU also includes a barrel shifter for shift and rotate operations. The complex integer unit is used for multiplication, division, and string functions. It can perform multiplications with a latency of three clock cycles and a throughput of one cycle as long as one operand is 16 bits or less in length. Full 32-bit by 32-bit multiplications have a latency of four clock cycles and a throughput of two cycles. The three integer execution units are independent and operate in parallel.
In addition to basic arithmetic and logical operations, the integer execution units provide some powerful bit-manipulation operations, such as rotate-mask-insert.
The PowerPC 604 has a single 32-bit address space. Separate 16 Kbyte instruction and data caches are available on-chip. Twice as much cache RAM is available on the PowerPC 604e, which provides separate 32 Kbyte instruction and data caches. Accesses to the instruction cache are 128 bits wide, providing the processor with four 32-bit instructions in a single cycle if a cache miss does not occur. Accesses to the data cache are 64 bits wide. However, since general-purpose registers on the PowerPC 604 are only 32 bits wide, the processor can only take advantage of the full 64-bit data cache access width when fetching or storing 64-bit double-precision floating-point variables. The PowerPC 604 supports virtual memory via separate instruction and data TLBs for fast address translations. Cache and TLB parameters are listed in Table 6.6-1. Like most general-purpose processors, the PowerPC 604's memory space is byte-addressable. The PowerPC 604 supports both big-endian and little-endian byte ordering, but lacks support for misaligned little-endian accesses. The PowerPC 604e adds support for misaligned little-endian accesses. The PowerPC 604 allows both the instruction and data caches to be locked. Each cache can be locked individually, but each cache must be locked as a whole. That is, it is not possible to lock only a portion of a cache. The PowerPC 604 also includes cache control instructions, including instructions that initiate pre-loading of a cache block, invalidate a cache block, set a cache block to zero, and flush a cache block. Support for maintaining coherency between caches on multiple processors is provided.
Since the PowerPC 604 has only one load/store unit, only one load or store can be performed per clock cycle. The PowerPC 604 uses a four-entry load buffer and a six-entry store buffer to reduce stalls. When a load or store misses the on-chip data cache, it is posted to the appropriate buffer. Other loads or stores can be executed out of order while the cache misses in the load and store buffers are processed. This allows several cache misses to occur before the load/store unit is completely stalled.
The above is a five-page excerpt from the fifteen-page PowerPC 604e analysis in DSP on General-Purpose Processors. For a list of other topics covered in the analysis, please see the Table of Contents of the report. |
|
|