Fixed-Point DSP Processors |
|||||||
| HOME << |
|||||||
Intel PXA255 and PXA26xThe Intel PXA255 and PXA26x are DSP-enhanced 32-bit fixed-point general-purpose processors based on Intel’s XScale processor core. The XScale architecture was introduced by Intel in 2000 and is a descendant of the Intel StrongARM core. The XScale architecture implements the ARM v5TE instruction set architecture and is binary upward compatible with the StrongARM and several other ARM architectures, including the ARM7, ARM9, and ARM9E cores. The PXA255 and PXA26x target handheld electronic devices, like 3G cellular phones and PDAs with wireless connectivity. The PXA255 and PXA26x are currently available in 200 MHz, 300 MHz, and 400 MHz variants; all chips can operate at core voltages ranging from 1.0 to 1.3 volts. For example, the 400 MHz variant can operate at 400 MHz at 1.3 volts, at 300 MHz at 1.1 volts, or at 200 MHz at 1.0 volts. ArchitectureXScale is a single-issue 32-bit RISC architecture that consists of a program control unit and a fixed-point integer data path. The XScale data path is augmented by a tightly coupled “DSP coprocessor” that contains a 40-bit accumulator and adds eight DSP-oriented instructions to the XScale instruction set.The XScale data path contains a 32-bit ALU, 32 × 32 multiply-accumulate unit, and barrel shifter. The data path has sixteen 32-bit registers (R0-R15) that serve as source and destination operands and as address registers. R15, R14, and R13 also serve as the program counter, link register, and stack pointer, respectively; the remaining registers are available for general use. XScale is capable of processing 8-, 16-, and 32-bit integer data in a traditional RISC single-instruction, single-data fashion, and 16-bit integer data types in a SIMD fashion. The multiply-accumulate unit supports signed or unsigned, 32 × 32 → 32 (“single-precision”) or 32 × 32 → 64 (“double-precision”) integer multiplications. Any general-purpose register can serve as the destination for single-precision multiply operations. XScale provides no guard bits for 32 × 32 multiplications. In double-precision multiply operations, any combination of two general-purpose registers can be specified to form a 64-bit destination register. Single-precision multiply operations include multiply, multiply-add, and multiply-accumulate. Double-precision multiply operations support only multiply and multiply-accumulate. The multiply-accumulate unit also supports signed integer 32 × 16 and 16 × 16 multiplies, multiply-adds, and multiply-accumulates. Each 16-bit input can be taken from the high or low half of a 32-bit register. The 32-bit result can be stored in a 32-bit register; the processor also supports a 16 × 16 → 64 multiply-accumulate operation that provides 32 guard bits. The DSP coprocessor adds six multiply-accumulate operations to the XScale architecture: 16 × 16 + 16 × 16 → 40 (SIMD), 16 × 16 → 40 (four variants), and 32 × 32 → 40. Any general-purpose register can serve as an input operand; the single 40-bit coprocessor accumulator is the destination. The DSP coprocessor cannot directly access memory; instead, it supports two instructions that perform data transfers between the coprocessor accumulator and any two general-purpose registers. The ALU in the main data path supports addition and subtraction (with or without carry) including saturating add and subtract instructions. XScale supports “double-saturating” versions of these instructions that left-shift one source operand by one (which removes the extra sign bit generated by multiplying two fractional values), saturate the left-shifted value, and then add/subtract this result to/from the other source operand. The final result is saturated again before writing to the destination register. The ALU also supports and, or, and xor operations, bit clear/test, 32-bit comparisons, and a count leading zeros (CLZ) instruction that supports normalization by computing the number of zeroed high-order bits of a source operand. XScale does not support rounding via dedicated hardware or instructions. The barrel shifter performs logical shift left, logical shift right, arithmetic shift right, and rotate right operations. As part of an ALU instruction, the barrel shifter can shift one of the ALU input operands prior to an ALU operation. If the shift is specified by an immediate operand, this shift operation is executed in the same cycle as the ALU instruction. Shifts are not supported by saturating arithmetic and normalization instructions, nor are they supported by multiplication instructions. XScale uses a Harvard memory architecture with separate buses for instructions and data. PXA255 and PXA26x family members each provide separate 32 Kbyte instruction and data caches and have a special 2 Kbyte mini data cache intended for “streaming” data. The main data cache is 32-way set associative and employs a round-robin replacement policy. Single data transfers of up to 32 bits per cycle are supported between the data cache and the core. Load and store “double” instructions, which transfer 64 bits every two cycles, must be aligned on 64-bit boundaries. XScale also supports multiple-word load and store instructions that use a single instruction to transfer up to sixteen 32-bit data words to/from consecutive memory locations at a rate of one 32-bit word per cycle (plus some initial setup overhead). At 400 MHz, XScale can achieve a maximum sustainable memory bandwidth of 800 million 16-bit half-words/second, assuming the words are arranged as pairs in memory. XScale does not support hardware looping. XScale includes a branch prediction unit. A 128-entry branch target buffer tracks the history of the most recent 128 conditional branch instructions and encodes the likelihood of a future branch as one of four states: strongly-taken, weakly-taken, weakly-not-taken, and strongly-not-taken. A mispredicted branch consumes five cycles, while a correctly predicted branch requires only one cycle. Addressing modes supported by XScale include register-direct, register-indirect with pre- or post-increment/decrement, and register-indirect with indexing. Multiple-word loads and stores support register-indirect addressing with pre- or post-increment. Single-word load and store instructions support register-indirect addressing with pre- or post-increment/decrement. XScale does not support bit-reversed addressing or modulo (circular) addressing. PeripheralsThe PXA255 and PXA26x offer a variety of peripherals, including an LCD controller, a USB controller, an AC97 controller, an IRDA port, an I2C port, an I2S port, three UARTS, several timers, a real-time clock, a 16-channel DMA controller, and an external memory controller supporting SDRAM, flash ROM, and Multimedia Card (MMC) devices. The PXA26x (but not the PXA255) also offers up to 32 MBytes of flash memory “stacked” on top of the processor as part of a multi-chip package.Power ConsumptionAccording to Intel, the PXA255 consumes 178 mW at 200 MHz and 1.0 volts. This measurement is based on the core running the Dhrystone 2.1 benchmark, and includes power for the processor core, on-chip memory, and the peripherals. The peripherals are clocked but are not performing any transactions for this measurement.CostIntel no longer supplies pricing information for the PXA255 or PXA26x.For Additional InformationThe PXA255 and PXA26x achieve a BDTImark2000™ score of 930 at 400 MHz. For more information and scores, click here. Additional analysis of this processor, including BDTI Benchmark™ results, is contained in BDTI’s report, Inside the Intel PXA27x.Last updated January 2005. |
|
|