Fixed-Point DSP Processors |
|||||||
| HOME << |
|||||||
LSI Logic LSI40x (ZSP400)The ZSP400 is a 16-bit fixed-point superscalar DSP core. First introduced by LSI Logic in 1999, the ZSP400 uses a four-issue superscalar architecture that can issue and complete up to four RISC-like instructions per instruction cycle. LSI Logic offers the ZSP400 core for license and uses the core as the basis for its LSI40x family of packaged processors. The ZSP400 core targets wireless communications equipment, such as cellular base stations, cellular handsets, and high-speed wireless LANs; voice-over-networks; and consumer devices. The ZSP400 is assembly-level upwards-compatible with the ZSP500. The first member of the LSI40x processor family, the 200 MHz LSI402ZX, began shipping in July 2000. The family also includes the LSI403LC and LSI403LP. The LSI403LC is currently sampling at 120 MHz and the LSI403LP is shipping at up to 150 MHz. The LSI402ZX operates at a core voltage of 1.8 volts, while the LSI403LC and LSI403LP operate at a core voltage of 1.2 volts. ArchitectureThe ZSP400 architecture consists of a 16-bit fixed-point data path, a “pipeline control unit,” and instruction and data caches. The pipeline control unit is responsible for fetching instructions and controlling program flow and also acts as an interface between the on-chip caches, memory, peripherals, and data path. The ZSP400 can support shared program and data memory or a Harvard memory architecture with separate program and data RAM. The LSI40x family uses a Harvard memory architecture.The ZSP400 does not contain an address generation unit. Instead, address generation is handled by the ALUs and the cache controllers. Although this method of address generation is typical for a general-purpose processor, it is unusual for a DSP. The ZSP400 uses a 16-bit data path which contains two 16-bit ALUs and two 16 × 16 → 32 multiply-accumulate (MAC) units. The data path is fundamentally a 16-bit data path: it uses 16-bit registers as inputs and stores results to 16-bit registers. However, most instructions also have variants that support 32-bit data, using concatenated 16-bit registers as operands. Several instructions have SIMD variants that treat each 32-bit input operand as two 16-bit operands. The data path contains sixteen 16-bit general-purpose registers, R0-R15. The registers can be concatenated in pairs; e.g., R1/R0, R3/R2, etc., to form eight 32-bit registers where the odd register makes up the upper 16 bits and the even register makes up the lower 16 bits. Two of the 32-bit register pairs, R1/R0 and R3/R2, are designated as accumulators A and B. A special 16-bit register contains eight guard bits for each of the accumulators. The ALUs perform a variety of operations including barrel shifting and bit manipulation. Bit-manipulation capabilities include set, clear, toggle, and test of individual bits in a 16-bit register. Unlike most instructions, the bit-manipulation operations do not have 32-bit variants. The MAC units can perform up to two 16 × 16 → 32-bit multiplications or one 32 × 32 → 32-bit multiplication in a single instruction cycle. Operands are always treated as signed values. A mode bit determines whether operands are treated as integers or as fractions. The add-compare-select instructions, which make use of the multiplier unit, can be used to implement a Viterbi “butterfly” in three instruction cycles. The ZSP400 uses a Harvard memory architecture with up to 62 K × 16 each for internal program and data memories, 2 K × 16 for program memory mapped to boot ROM, and 2 K × 16 for data memory mapped to peripheral registers. The LSI402ZX contains 62 K × 16 each for program and data memory. The LSI403LC and LSI403LP uses 16 K × 16 for each. The LSI403LC and LSI403LP also include a 16 K × 16 block of RAM that can be configured as either program or data memory. The ZSP400 provides separate on-chip caches for instructions and data. The instruction cache is a direct-mapped, eight-line cache with a line size of four 16-bit words, corresponding to four instructions per cache line. In each clock cycle, four instructions can be fetched from program memory into the instruction cache, and four instructions can be transferred from the instruction cache to the program control unit. A prefetch mechanism attempts to predict program flow and load instructions into the instruction cache. The data cache is fully associative and contains eight cache lines with a line size of four 16-bit words. As with the instruction cache, the data cache has a prefetch mechanism. On a 200 MHz ZSP400, the maximum on-chip data memory bandwidth is 800 million 16-bit words/second for loads and 400 million 16-bit words/second for stores, if data is arranged as 16-bit pairs in memory. The sustainable bandwidth is either 800 million 16-bit words/second for loads, or 400 million 16-bit words/second for loads and 400 million 16-bit words/second for stores. The ZSP400 supports register-direct and indexed register-indirect addressing. Any general-purpose register can be used as an address register. Many instructions also support immediate operands. The ZSP400 does not provide hardware looping, but four loop counters are provided. These loop counters enable the ZSP400 to perform software looping without loop overhead in most cases. PeripheralsThe ZSP400 core includes two 16-bit timers. In addition to these timers, the on-chip peripherals of the LSI40x include serial ports, bit I/O, a host port, and DMA controllers.Power ConsumptionThe LSI403LC consumes about 63 mW at 150 MHz and 1.2 V. This measurement includes the core, memory, and all peripherals including the PLL. Except for the PLL, all peripherals are inactive. The workload for this measurement is the BDTI Block FIR Filter benchmark.CostThe ZSP400 is a processor core intended for use as part of a complete chip, and therefore does not have a fixed per-unit price. ZSP400-based chip prices will vary.As of the last quarter of 2004, 10,000-unit prices for the LSI40x range from $3.96 for the LSI403LC to $12.75 for the LSI402ZX. For Additional InformationThe ZSP400 achieves a BDTImark2000™ score of 1220 at 260 MHz. For more information and scores, click here. A complete analysis of this processor, including BDTI Benchmark™ results, is contained in BDTI’s report, Inside the LSI Logic ZSP500.Last updated January 2005. |
|
|