Fixed-Point DSP Processors
BDTI
HOME << FREE INFO << PROCESSOR OVERVIEWS << BDTI

Analog Devices ADSP-BF5xx (Blackfin)

The ADSP-BF5xx (Blackfin) is a family of 16-bit fixed-point DSP packaged processors from Analog Devices. The ADSP-BF5xx combines features of low-power DSPs with features traditionally associated with general-purpose processors, such as privilege modes and memory protection. The ADSP-BF5xx targets power-sensitive applications, such as cellular phones; applications that require the functionality of both a DSP and a general-purpose processor, such as automotive applications; and computationally intensive applications, such as communications infrastructure equipment.

The ADSP-BF5xx is based on the Micro Signal Architecture (MSA) instruction set architecture jointly developed by Analog Devices and Intel. There are two generations of Blackfin processors, which use different microarchitectures and have different pipeline depths. The ADSP-BF535 is the sole first-generation Blackfin processor. Second-generation Blackfin processors include the ADSP-BF531, ADSP-BF532, ADSP-BF533, and dual-core ADSP-BF561, all of which are optimized for higher clock speeds and lower power consumption. The fastest members of the ADSP-BF5xx family, the ADSP-BF533 and ADSP-BF561, operate at 756 MHz at 1.4 volts.

Because of the differences in microarchitecture, code optimized for the ADSP-BF535 may execute in a different number of cycles on a second-generation device (ADSP-BF531, ADSP-BF532, ADSP-BF533, and ADSP-BF561). Additionally, there is a small class of instructions for which functionality differs slightly between generations of Blackfin processors. Thus binary compatibility between generations is not maintained. In this overview all of the Blackfin processors are referred to collectively as the “ADSP-BF5xx” in cases where all family members share the same characteristics. Readers interested in ADSP-BF535 benchmark results should refer to BDTI’s report, Inside the ADI/Intel Micro Signal Architecture.

Architecture

The ADSP-BF5xx architecture contains two fixed-point data paths, two address generation units, a program sequencer, and up to five separate memory banks. The ADSP-BF5xx uses a load/store architecture that generally takes inputs from and returns results to the data register file. The data register file contains eight 32-bit data registers. Two separate sets of register files are provided for addressing. The ADSP-BF5xx also includes two 40-bit accumulator registers.

Each of the two ADSP-BF5xx data paths includes a multiplier and an ALU. One of the data paths also includes a barrel shifter. The ADSP-BF5xx can issue one instruction that uses the two ALUs or the two MAC units in parallel, but it does not support instructions that use, for example, one ALU and one MAC unit. Shifter operations cannot be executed in parallel with ALU or MAC operations. Thus, the ADSP-BF5xx data paths are not fully independent.

In addition to instructions that use both ALUs or both MAC units in parallel, the ADSP-BF5xx supports SIMD operations within each ALU and within the shifter (but not within the MAC units). These SIMD operations allow each execution unit to perform two operations per cycle. Thus, it is possible to perform four operations per clock cycle using both ALUs or two operations per clock cycle using the shifter. However, the SIMD operations within the ALUs are supported only as part of a SIMD operation across the data paths, so that both ALUs must perform the same type of SIMD computation (e.g., a dual 16-bit add/subtract).

Each MAC unit can perform a 16-bit multiplication with accumulation to a 40-bit accumulator in a single cycle. Optional saturation and rounding to 16 or 32 bits is supported. The MAC units can also be used in parallel to perform two 16-bit multiplies per cycle. Using both MAC units, the ADSP-BF5xx can also perform a 32-bit multiply with a 32-bit result in five cycles. Eight-bit multiplication is not supported.

Supported ALU operations include add, subtract, add/subtract, comparison, minimum, maximum, a divide primitive, and various logic operations. The ADSP-BF5xx also provides a “vector search” instruction intended to find extrema in an array of data. Special eight-bit operations are included for video applications. Dual SIMD 16-bit additions and subtractions are also supported; one variant takes inputs from the packed 16-bit halves of two 32-bit data registers, producing two 32-bit results. Two instructions are provided that perform simultaneous addition and subtraction. One variation takes the values from two 32-bit source registers and then adds and subtracts the results, placing the two results in two registers. The other variation takes four 16-bit inputs (from two 32-bit registers) and performs two 16-bit additions and two 16-bit subtractions. The four 16-bit results are packed into two 32-bit destination registers.

The first-generation ADSP-BF535 pipeline has eight stages; the second-generation ADSP-BF531, ADSP-BF532, ADSP-BF533, and ADSP-BF561 pipelines have ten stages. Both generations have fully interlocked pipelines. Most instructions have single-cycle latencies, but a few have multi-cycle latencies. Moves between dissimilar registers, such as a move from a loop counter to a data register, account for most of the multi-cycle latencies. In general, these instructions have a latency of two cycles; a few instructions have latencies of up to six cycles.

The ADSP-BF5xx has no delayed branches; instead, the ADSP-BF5xx speculatively executes code after a conditional branch. The ADSP-BF5xx uses static branch prediction for all conditional branches. The programmer (or compiler) must specify the prediction for each branch. Correctly predicted branches on the ADSP-BF535 use four cycles if they are taken and only one cycle if they are not taken; all mispredicted branches consume seven cycles. The longer pipeline of the second-generation ADSP-BF531, ADSP-BF532, and ADSP-BF533 adds one cycle to the time required by mispredicted branches.

All ADSP-BF5xx memory is organized into a single unified 32-bit address space. However, the ADSP-BF5xx uses a physical memory hierarchy with two memory levels. These memory levels are referred to as level one (L1) and level two (L2). All ADSP-BF5xx family members include a similar, but not identical, L1 memory system on-chip, but only the ADSP-BF535 includes an on-chip L2 memory system.

The ADSP-BF5xx L1 memory system uses a modified Harvard memory architecture. The processor accesses L1 memory through a 64-bit instruction bus and two 32-bit data buses; each bus has a corresponding 32-bit address bus. The ADSP-BF5xx can perform one instruction read and two data transfers in each cycle. Only one of the transfers can be a store. If data is arranged as 16-bit pairs in memory, the ADSP-BF5xx can transfer four 16-bit values each cycle. Thus, the maximum sustainable data bandwidth is 3,024 million 16-bit words per second at 756 MHz for reads, or 1,512 million 16-bit words per second for writes.

The ADSP-BF5xx has two address units that can each generate an independent address in each cycle. The processor supports register-indirect and register-indirect with post-increment or post-decrement addressing.

Peripherals

Current ADSP-BF5xx family members include general-purpose I/O, two synchronous serial ports, one or two SPI ports, one or two parallel ports (none on the ADSP-BF535), timers (including a watchdog timer), one or two UARTs, a real-time clock, and one or two DMA controllers. The ADSP-BF535 also includes a PCI interface and USB interface.

Power Consumption

ADSP-BF5xx family members are designed to operate over a range of clock speeds and operating voltages, and include circuitry to ensure stable transitions between operating states.

Power consumption for ADSP-BF5xx family members varies by family member. ADSP-BF533 power consumption ranges from 24 mW at 100 MHz and 0.8 volts to 644 mW at 756 MHz and 1.4 volts. These power consumption numbers are for the BDTI Block FIR benchmark and include power for the PLL; peripherals are disabled. The numbers do not include power for the voltage regulator.

Cost

Pricing for ADSP-BF5xx chips in 10,000-unit quantities ranges from about $5 (for the ADSP-BF531 at 400 MHz) to about $40 (for the dual-core ADSP-BF561 at 756 MHz).

For Additional Information

The ADSP-BF5xx achieves a BDTImark2000™ score of 4190 at 756 MHz. For more information and scores, click here. A complete analysis of this processor, including BDTI Benchmark™ results, is contained in BDTI’s report, Buyer’s Guide to DSP Processors, 2004 Edition.

Last updated January 2005.

Top of page