Optimized DSP Software • Independent DSP Analysis



# **Comparing FPGAs and DSPs for Embedded Signal Processing**

Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California 94704 USA +1 (510) 665-1600

> info@BDTI.com http://www.BDTI.com

© 2002 Berkeley Design Technology, Inc.

# **About BDTI**



### **ANALYSIS**

- Evaluation of processors' DSP performance and capabilities
- Advisory and consulting services
- Technical publications
- Technical training
- Custom benchmarking

### **DEVELOPMENT**

- Implementation of optimized DSP application software
- Implementation of optimized DSP software libraries
- Algorithm development

© 2002 Berkeley Design Technology, Inc.

### **Presentation Outline**



What are the driving applications?

How are DSPs meeting application needs?

Why consider FPGAs?

How do DSPs and FPGAs stack up in terms of performance?

What other factors influence designers' decisions?



© 2002 Berkeley Design Technology, Inc.

3

# **Communications: The "Killer App"** Computer 9.2% Consumer 7.3% Wireline Wireless 6.9% 62.4% **Automotive** 3.1% Other 11.1% Programmable DSP Revenues by Market, Jan-Aug 2002 2002 Revenues: \$4.5 Billion (Projected) © 2002 Berkeley Design Technology, Inc. Source: Forward Concepts

© 2002 Berkeley Design Technology, Inc.

Stanford University Page 2 October 2002



# **Comms Apps: Two Types**

### Infrastructure

- Wired
  - E.g., xDSL, "cable," VoIP gateway
- Wireless
  - E.g., cellular, PCS, fixed wireless, satellite

### **Terminals**

- Portable
  - Battery-powered, size-constrained
- Non-portable (e.g., "CPE")

© 2002 Berkeley Design Technology, Inc.

5

# **Terminal Requirements**



### Key criteria

- Sufficient performance
- Cost
- Energy efficiency
- Memory use
- Small-system integration support
- Packaging
- Tools
- Application-development infrastructure
- Chip-product roadmap

© 2002 Berkeley Design Technology, Inc.

6

# **Infrastructure Requirements**



### Key criteria

- Board area per channel
- Power per channel
- Cost per channel
- Large-system integration support
- Tools
- Application-development infrastructure
- Architecture roadmap

© 2002 Berkeley Design Technology, Inc.

7



© 2002 Berkeley Design Technology, Inc.

Stanford University

Page 4

October 2002



# **Key Processing Technologies**

**DSPs** 

GPPs/DSP-enhanced GPPs

Reconfigurable architectures

- FPGAs
- Reconfigurable processors

Massively parallel

processors

**ASSPs** 

### **ASICs**

- Licensable cores
- Customizable cores
- Platform-based design

© 2002 Berkeley Design Technology, Inc.

9

## **DSPs: The Incumbents**



Modern conventional DSPs introduced ~1986

- One instruction, one MAC per cycle
- Developed primarily for telecom applications

High-performance VLIW DSPs introduced ~1997

- Developed primarily for wireless infrastructure
- Speed focused:
  - Independent execution units support many instructions, MACs per cycle
  - Deeper pipelines and simpler instruction sets support higher clock rates
- Emphasis on compilability

© 2002 Berkeley Design Technology, Inc.

10





© 2002 Berkeley Design Technology, Inc.

Stanford University Page 6 October 2002



### **Other Infrastructure DSPs**

#### Texas Instruments TMS320C64xx

- 8-issue 16-bit fixed-point architecture
  - Up to four 16-bit MACs per cycle
  - Special instructions and co-processors for communications applications
  - Compatible with 'C62xx, 'C67xx
- Sampling at 600 MHz, \$111 (10 ku)

### Analog Devices TigerSHARC

- 4-issue fixed- and floating-point
  - Up to eight 16-bit fixed-point MACs per cycle
  - Special instructions for 3G base stations
  - High memory bandwidth (8 GB/s)
- Shipping at 250 MHz, \$175 (10 ku)

© 2002 Berkeley Design Technology, Inc.

13

## **DSP Processors**



Strengths and Weaknesses

- †DSP performance, efficiency strong compared to other off-the-shelf processors
- ↓But may not be adequate for demanding tasks
- †Relatively easy to program
  - ↓ But compilers are often inefficient
  - ↓ And `C6xxx processors are assembly programmer's worst nightmare
- †Good DSP-oriented dev. tools, infrastructure
  - † TI's dev. infrastructure is particularly good
  - ↓ But mediocre dev. infrastructure for non-DSP tasks

© 2002 Berkeley Design Technology, Inc.

14

# **BDT**i

### **DSP Processors**

Strengths and Weaknesses

- †Relatively low development cost, risk
  - ↑ Mature technology
  - † Large, experienced developer base
  - ↑ Fast time-to-market
  - † Some architectures available from multiple vendors
  - ↓ But some vendors' roadmaps are unclear
- ↓Relatively limited product offerings
  - † But products offer strong, relevant integration

© 2002 Berkeley Design Technology, Inc.

15



© 2002 Berkeley Design Technology, Inc.

# **BDT**i

# **Why Consider FPGAs?**

"As the industry shifts from second-generation, 2G, to 3G wireless we see the percentage of the physical layer MIPS that reside in the DSP dropping from essentially 100 percent in today's technology for GSM <u>to about 10 percent</u> for wideband code-division multiple access (WCDMA)."

Texas Instruments IEEE Communications Magazine January 2000

© 2002 Berkeley Design Technology, Inc.

17

### **FPGAs**

Field-Programmable Gate Arrays

An amorphous "sea" of reconfigurable logic with reconfigurable interconnect

 Possibly interspersed with fixed-logic resources, e.g., processors, multipliers

Potential for very high parallelism

Historically used for prototyping and "glue logic," but becoming more sophisticated

- DSP-oriented architecture features
- DSP-oriented tools and design libraries
  - Viterbi, Turbo, and Reed-Solomon coders and decoders, FIR filters, FFTs,...

Key DSP players: Altera and Xilinx

© 2002 Berkeley Design Technology, Inc.

18

© 2002 Berkeley Design Technology, Inc.

Stanford University Page 9 October 2002



# **Altera Stratix**

High-end, DSP-enhanced FPGAs

- IP blocks
  - Filters, FFTs, Viterbi decoders,...
  - Nios processor
  - Third-party IP, e.g., DMA controllers
- DSP tools
  - Parameterized IP block generators
  - Simulink to FPGA link
  - C+Simulink to FPGA design flow
- Sampling now; production end of 2002
- Prices begin at \$170 (1 ku)

© 2002 Berkeley Design Technology, Inc.

20





© 2002 Berkeley Design Technology, Inc.

Stanford University Page 11 October 2002



### **FPGAs**

Strengths and Weaknesses

- †Massive performance gains on some algorithms
- †Architectural flexibility can yield efficiency
  - † Adjust data widths throughout algorithm
  - † Parallelism where you need it
  - † Massive on-chip memory bandwidth
- ↓Efficiency compromised by generality
  - Embedded MAC units and memory blocks improve efficiency but reduce generality
- †Re-use hardware for multiple tasks
- †Field reconfigurability (for some products)

© 2002 Berkeley Design Technology, Inc.

23

## **FPGAs**



Strengths and Weaknesses

- †Potentially good cost and power efficiency
  - ↓ But prices and power consumption are much higher than DSPs'
- ↓ Development is long and complicated
  - ↓ Design flow is unfamiliar to most DSP engineers
  - † But cost and complexity is much lower than ASICs'
  - † And processor cores reduce development burden
- Development infrastructure badly lags DSPs'
  ↓ DSP-oriented tools are immature
- Xilinx has mature products, but others are playing catch-up

© 2002 Berkeley Design Technology, Inc.

24



# **Performance Analysis**

- Comparing performance of off-the-shelf DSP to that of FPGAs is tricky
- Common MMACS metric is oversimplified to the point of absurdity
  - FPGAs vendors use distributed-arithmetic benchmark implementations that require fixed coefficients
  - MMACS metric overlooks need to dedicate resources to non-MAC tasks
  - Many important DSP algorithms don't use MACs at all!

© 2002 Berkeley Design Technology, Inc.

25

# **Alternative Approach: Application Benchmarks**



Use a full application, e.g., N channels of an OFDM receiver

#### Hazards:

- Applications tend to be ill-defined
- Hand-optimization usually required in realworld applications
  - Costly, time-consuming to implement
  - Evaluates programmer as much as processor
  - What is a "reasonable" benchmark implementation?

© 2002 Berkeley Design Technology, Inc.

26



# **Solution: Simplified Application Benchmark**

BDTI's benchmark is based on a simplified OFDM receiver

- Closely resembles a real-world application
- Simplified to enable optimized implementations
- Constrained to ensure consistent, reasonable implementation practices

### Benchmark goals:

- Maximize the number of channels
- Minimize the cost per channel

© 2002 Berkeley Design Technology, Inc.

27

# **Benchmark Overview**

&DTi

# Flexibility is an asset:

- Algorithms range from table look-ups to MACintensive transforms
- Data sizes range from 4 to 16 bits
- Data rates range from 40 to 320 MB/s
- Data includes real and complex values



© 2002 Berkeley Design Technology, Inc.

Stanford University Page 14 October 2002



# **Benchmark Results**



|                  | Motorola<br>MSC8101<br>(300 MHz) | Altera Stratix<br>1S20-6<br>(Projected) | Altera Stratix<br>1S80-6<br>(Preliminary) |
|------------------|----------------------------------|-----------------------------------------|-------------------------------------------|
| Channels         | <<1                              | ~10                                     | ~50                                       |
| Cost (1 ku)      | \$140                            | \$325                                   | \$3,480                                   |
| Cost per channel | ~\$500                           | ~\$10                                   | ~\$50                                     |

These results are approximate. For full results, see BDTI's report, FPGAs for DSP.

© 2002 Berkeley Design Technology, Inc.

30





© 2002 Berkeley Design Technology, Inc.

Stanford University Page 16 October 2002

# **BDT**i

# Why Use a DSP?

- Many applications are not amenable to FPGA implementations
  - Parallellism is sometimes inherently limited
  - Ultimate speed is not always the first priority
- FPGAs are still too expensive for terminal applications
- FPGA energy efficiency is still an unknown
- Implementing a complex algorithm is much more difficult on an FPGA than on a DSP

© 2002 Berkeley Design Technology, Inc.

33

## **Conclusions**



- High-end FPGAs can wallop DSPs on computation-intensive, highly parallelizable tasks
- FPGAs are expensive, but they can beat DSPs in terms of performance per dollar
- DSP have the advantage in development infrastructure, time-to-market,...
- The "best" architecture depends on the application
- Heterogeneous architectures, e.g., combining DSP and FPGA components, are a key trend

© 2002 Berkeley Design Technology, Inc.

34

