Fixed-Point DSP Processors
BDTI
HOME << FREE INFO << PROCESSOR OVERVIEWS << BDTI

LSI Logic ZSP500

The ZSP500, introduced in 2002, is a licensable superscalar DSP core from LSI Logic. The ZSP500 is fundamentally a 16-bit fixed-point architecture, but it also supports a broad range of 32-bit fixed-point operations. The ZSP500 targets cost- and power-sensitive applications including cellular handsets, wireless LANs, portable multimedia devices, and multi-channel voice-over-IP (VoIP) applications. The ZSP500 is currently available for license as a synthesizable Verilog core. According to LSI Logic, the ZSP500 will operate at 340 MHz (worst-case) at 1.2 volts in a 0.13-micron process.

The ZSP500 is the first core to implement the ZSP G2 instruction set architecture. The ZSP G2 instruction set architecture is based on the older architecture used in the ZSP400, and the ZSP400 is assembly-level upward compatible with the ZSP G2 architecture. Like the ZSP400, the ZSP500 can execute up to four instructions in a single clock cycle, and it features a dual-MAC unit and dual-ALU. LSI Logic also offers two other cores based on the G2 instruction set architecture: the ZSP540 and the ZSP600. Both are quad-MAC architectures. The ZSP540 executes up to four instructions per cycle, while the ZSP600 executes up to six.

Like the ZSP400, the ZSP G2 uses a RISC-like instruction set; however, the ZSP G2 uses a mixed-width 16- and 32-bit instruction set, while the ZSP400 uses only 16-bit instructions. The ZSP G2 adds a variety of new instructions, including support for conditional execution, 40-bit ALU operations, bit field manipulation, and various instructions intended to improve compiler performance. The ZSP G2 also expands the complement of guard registers in the data register file and adds dedicated address registers.

Architecture

The ZSP500 is a superscalar architecture that can issue and complete up to four RISC-like instructions per instruction cycle. The ZSP500 architecture consists of a 16-bit fixed-point data path, a “prefetch unit,” an “instruction scheduler,” and a “pipeline control unit.” The ZSP500 also includes a load/store unit that contains two independent address generation units and a branch prediction unit. The ZSP500 contains an operand register file with sixteen 16-bit registers.

The only memory contained in the ZSP500 core is the small instruction cache inside the prefetch unit. (However, ZSP500 licensees can use a reference system design that features a data cache and program and data SRAM.) The size and architecture of the main memory system will vary among ZSP500-based chips. For example, the ZSP500 can support either a unified program and data memory architecture or a Harvard memory architecture with separate program and data RAM.

The ZSP500 data path contains two 16-bit ALUs and a combined multiplier/ALU. The combined multiplier/ALU contains two 16-bit multiply-accumulate (MAC) units and a 40-bit ALU. The two 16-bit ALUs can be used in parallel with any of the execution units in the combined multiplier/ALU unit; however, the 40-bit ALU cannot be used in parallel with the MAC units.

The data path is fundamentally a 16-bit data path: it uses 16-bit registers as inputs and stores results to 16-bit registers. However, most instructions also have variants that support 32-bit data, using paired 16-bit registers as operands. Each register pair also has a corresponding 8-bit guard register that extends the register pair to 40 bits.

The ZSP500 supports several types of register comparisons, bit field manipulating instructions, and standard arithmetic instructions. The ZSP500 also supports SIMD add and subtract operations, which treat 32-bit data register pairs as packed 16-bit data. Saturation is available for arithmetic operations, arithmetic left-shifts, and multiply operations via a mode bit. Logical operations include and, or, exclusive-or, and not.

The combined multiplier/ALU unit contains two 16-bit MAC units that operate with a throughput of up to two 16 × 16 → 16-bit multiplications, up to two 16 × 16 → 32-bit multiplications, or one 32 × 32 → 32-bit multiplication per cycle. The combined multiplier/ALU unit can accept only one instruction per cycle; hence, its two MAC units cannot operate independently. Instead, the ZSP500 uses SIMD dual-MAC operations. Instructions that perform two 16 × 16 → 32-bit multiplications treat two 32-bit operands as pairs of 16-bit registers. Although these instructions perform two multiplications, they generate only one result, i.e., the two 32-bit products are always combined together and stored in, or added to, the same 40-bit extended register pair and guard bits.

The ZSP500 uses an interlocked eight-stage pipeline. Multiply operations have two-cycle latencies and single-cycle throughput; all other data path instructions have single-cycle latency and throughput.

The ZSP500 core interfaces to an off-core memory controller with a 128-bit instruction bus, two 32-bit data read buses, and two 32-bit data write buses. The ZSP500 sends addresses to the memory controller via a 24-bit instruction address bus and two 24-bit data address buses. Although the ZSP500 can address only two data transfers per cycle, it can transfer up to 128 bits of data per cycle by tying the 32-bit data buses together for one 64-bit read and one 64-bit write. Thus, the ZSP500 can complete a maximum of four 16-bit reads and four 16-bit writes per cycle as long as the 16-bit words are arranged in contiguous groups of four in memory.

The ZSP500 contains two AGUs. Each AGU supports one 16-, 32-, 40-, or 64-bit transfer per cycle. The ZSP500 supports register-indirect addressing with optional post-modification by an immediate value or by the contents of a modifier register. The ZSP500 also supports bit-reversed addressing and indexed addressing.

Peripherals

The ZSP500 core includes two timers.

Power Consumption

The ZSP500 is a synthesizable core; power consumption will vary among implementations. According to LSI Logic, the ZSP500 consumes 12.9 mW at 50 MHz and 1.0 volts in the TSMC CL013LV low-K process. This number is for the core only and is based on LSI Logic’s vector dot product benchmark.

Cost

The ZSP500 is a core. LSI Logic does not publicly disclose license fees and royalties. Production costs are chip specific.

For Additional Information

The ZSP500 achieves a BDTIsimMark2000™ score of 2690 at 340 MHz. For more information and scores, click here. A complete analysis of this processor, including BDTI Benchmark™ results, is contained in BDTI’s report, Inside the LSI Logic ZSP500.

Last updated January 2005.

Top of page