Insight, Analysis, and Advice on Signal Processing Technology



# **Processors for Embedded Digital Signal Processing**

Jeff Bier President

**BDTI** 

Oakland, California USA +1 (510) 451-1800

info@BDTI.com http://www.BDTI.com

© 2008 Berkeley Design Technology, Inc.



### **Topics**

**Definitions** 

DSP algorithms shape DSPs

Processor selection criteria

DSPs vs. GPPs

Comparing performance

Conclusions

3 2008 BDTI

INSIGHT • ANALYSIS • ADVICE



#### **Definitions**

Microprocessors–General-Purpose Processors (GPPs)

- · 32-bit GPPs for embedded applications
  - E.g., ARM ARM7

Digital Signal Processors (DSPs)

- Microprocessors specialized for signal processing applications
  - E.g., Texas Instruments C55x+

DSP-enhanced GPPs and hybrids

- GPPs with added DSP features, or processors designed with DSP and GPP attributes
  - E.g., MIPS MIPS24KE, Microchip dsPIC, Analog Devices Blackfin

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE
ON SIGNAL PROCESSING TECHNOLOGY

3



#### **DSP Algorithms Shape DSPs**

**How Signal Processing is Different From Other Tasks** 

- Very computationally demanding
- Requires attention to numeric fidelity
- High memory bandwidth requirements
- Streaming data—and lots of it
- Predictable data access patterns
- Execution-time locality
- Math-centric
- Real-time constraints
- Standards: algorithms, interfaces

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE











#### **Performance**

- Data path
  - · Computational resources
  - SIMD
- Memory architecture
  - · Harvard vs. Von Neumann
  - · Cache vs. SRAM with DMA
- Real-time considerations
  - Non-determinism
  - · Dynamic features

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE
ON SIGNAL PROCESSING TECHNOLOGY

9

## **Comparing DSPs and GPPs**Data Path



#### Low-end DSP

Dedicated hardware performs all key arithmetic operations in 1 cycle

Usually 16-bit, fractional, integer Hardware support for managing numeric fidelity

 Guard bits, saturation, rounding modes, ...

Limited bit-manipulation capabilities

#### Low-end GPP

Multiplies often take >1 cycle

Multi-bit shifts often take >1 cycle

Usually 32-bit, integer only

Saturation, rounding typically take extra cycles

May have superior bitmanipulation capabilities

a 2008 BDT

INSIGHT • ANALYSIS • ADVICE
ON SIGNAL PROCESSING TECHNOLOGY

10



### **Memory Structure**



- Harvard vs. Von Neumann
  - Harvard separate memories for data and instructions
    - Von Neumann single memory for data and instructions
- Bandwidth between processor and on-chip memory
- Size of on-chip memory
  - Larger memory is better for performance, but hurts cost and increases power
  - Fetching data from external memory consumes cycles and power
- Memory control
  - Caches
  - SRAM with DMA

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE

12









© 2008 BDTI







### **Development Effort**



### Compiler friendliness

- GPPs generally have the advantage
- SIMD difficult for compilers, whether GPP or DSP
  - Often requires assembly programming or use of intrinsics—both of which complicate software development

### **Development support**

- DSPs have more 3<sup>rd</sup> party DSP-oriented IP, DSP-oriented tools
- GPPs have better non-DSP-oriented support

a 2008 BDT

INSIGHT • ANALYSIS • ADVICE

20



### **Comparing DSPs and GPPs**

**Instruction Set** 

<u>Low-end DSP</u> <u>Low-end GPP</u>

Specialized, complex General-purpose instructions

instructions

Multiple operations per Typically only one operation per

instruction instruction

Poor orthogonality Good orthogonality

mac x0,y0,a x:(r0)+,x0 y:(r4)+,y0

mpy r2,r3,r4
add r4,r5,r5
mov (r0),r2
mov (r1),r3
inc r0
inc r1

2008 BDTI

INSIGHT • ANALYSIS • ADVICE
ON SIGNAL PROCESSING TECHNOLOGY

₿DTi

### **Comparing DSPs and GPPs**

**Development Support** 

|                                    | DSPs                                                                | GPPs                                                               |
|------------------------------------|---------------------------------------------------------------------|--------------------------------------------------------------------|
| Tools in general                   | Primitive to moderately sophisticated                               | Primitive to very sophisticated                                    |
| DSP-specific tool support          | Good to excellent E.g., cycle-accurate simulators, DSP C extensions | Poor but improving E.g., general lack of cycle-accurate simulators |
| 3rd-party DSP software support     | Poor to excellent                                                   | Limited but growing                                                |
| Non-DSP 3rd-party software support | Limited but growing Few to moderate RTOS options                    | Extensive<br>Few to extensive<br>RTOS options                      |
| Links w/other high-<br>level tools | E.g., MATLAB                                                        | E.g., GUI builders                                                 |

2008 BDTI INSIGHT • ANALYSIS • ADVICE ON SIGNAL PROCESSING TECHNOLOG









© 2008 BDTI







#### **Power**

- Parallelism
  - · Parallel computation allows lower clock rate...
  - · But may increase leakage current
- Suitability of instruction set
  - · A better matched instruction set allows lower clock rate
- Dynamic features
  - · Caches may cause data/instruction traffic increase
  - Superscalar hardware scheduling consumes power
- On-chip integration
  - Memory architecture
    - · Fetching data from external source expensive
  - Smart peripherals aid parallelism, may enable more processor sleep time
- Power-management features

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE
ON SIGNAL PROCESSING TECHNOLOGY

29



### **Comparing Performance**

When evaluating processors for signal processing, application-specific, product-specific considerations dominate

Relative performance can vary dramatically depending on the benchmark

Vendor performance claims should be viewed skeptically

- "MIPS" = ...
- Benchmarks are a sharp tool

Performance is more than speed

Cost/perf, energy efficiency, memory use...

© 2008 BDTI

INSIGHT • ANALYSIS • ADVICE

30







#### When Should You Consider a DSP?

- You need maximum performance or efficiency on a signal-processing-heavy workload
- You have compatible software you want to re-use
- Your developers are already familiar with it
- You need limited non-signal-processing software
- You'll be developing demanding DSP software
- A DSP offers good off-the-shelf software for your application
- You don't need a full-featured operating system
- You need maximum execution-time determinism
- A DSP offers superior integration

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE

33



#### When Should You Consider a GPP?

- A GPP offers sufficient performance and efficiency on your signal-processing-heavy workload
- You have compatible software you want to re-use
- You want to be able to switch vendors but not ISAs
- Your developers are already familiar with it
- You need extensive non-signal-processing software
- You won't be developing much DSP software
- A GPP offers good off-the-shelf software for your application
- You need a full-featured operating system
- Execution-time determinism is not critical
- A GPP offers superior integration

© 2008 BDTI

ON SIGNAL PROCESSING TECHNOLOG

34



#### Can I Have the Best of Both Worlds?

#### Maybe.

#### Options include:

- Two processors
  - One or two chips
    - · But: Cost; multiprocessor software development
- DSP-enhanced GPP
  - But: Typically compromise on DSP-oriented tools, software, integration
- Hybrid
  - But: Typically compromise on GPP-oriented tools, software
- "Application processors"
  - But: Tend to be focused on mobile multimedia applications

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE

35

### For More (Free!) Information... www.BDTI.com www.InsideDSP.com Inside DSP newsletter benchmark scores for dozens of processors, FPGAs, video solutions, etc. Pocket Guide to Processors for DSP Basic stats on over 40 processors TEXAS INSTRUMENTS DIGITAL VIDEO EVALUATION MODULE (DVEVM) Articles, white papers, and presentation slides Processor architectures and performance Signal processing applications Signal processing software optimization comp.dsp FAQ INSIGHT • ANALYSIS • ADVICE







### **Example: Video Processing**

- Computational demands: high
  - · Example: color conversion
    - CIF (352 by 288 pixel), 15 fps, conversion (without any interpolation) requires over 18 million operations per second
- Numeric fidelity: 8 to 12-bit pixels
- High memory bandwidth
  - E.g., D1 video (720x480), 30 fps
    - (720\*480 pixels) (3 RGB values) (8 bits) (30 frames) = 31.1 Mbytes/second
- Highly parallelizable
- Predictable data access patterns
  - Motion estimation and compensation notable exceptions

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE
ON SIGNAL PROCESSING TECHNOLOGY

30

## Comparing DSPs and GPPs SIMD Features



#### Low-end DSP & GPP

DSPs: very limited SIMD features

 E.g., dual add, subtract of 16-bit fixed-point data

GPPs: No SIMD support

#### High-performance DSP & GPP

DSPs: limited to extensive SIMD features

- E.g., TigerSHARC
  - 4 x 32-bit float
  - 4 x 32-bit integer
  - 8 x 16-bit integer
  - 16 x 8-bit integer

#### GPPs: extensive SIMD features

- E.g., PowerPC 74xx
  - 4 x 32-bit float
  - · 4 x 32-bit integer
  - 8 x 16-bit integer
  - 16 x 8-bit integer

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE
ON SIGNAL PROCESSING TECHNOLOGY

40





© 2008 BDTI



#### **Real-Time Considerations**

- Performance
  - Can the processor handle the load?
- Non-determinism
  - · Non-determinism causes load variance
  - · Complicates optimization and debugging
  - Caused by:
    - · Dynamic processor features
    - · Data-dependent algorithm behavior
    - Multi-tasking

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE
ON SIGNAL PROCESSING TECHNOLOGY

43



### **Comparing DSPs and GPPs**

**Dynamic Features** 

Dynamic features are common in high-end GPPs to boost performance

- Superscalar execution
- Caches
- Branch prediction
- Data-dependent instruction execution times

These features are occasionally used in DSPs, too

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE

14



#### **Comparing DSPs and GPPs**

**Dynamic Features** 

Low-end GPPs and DSPs

High-performance GPPs and

**GPPs**:

DSPs:

Dynamic caches common

Rarely have dynamic

features

GPPs: Moderate to extensive use of dynamic features

- Dynamic caches standard
- Superscalar execution, branch prediction common

DSPs: Mostly avoid dynamic features

- Cache is most common dynamic feature
- Superscalar execution rare
- Branch prediction sometimes used

INSIGHT • ANALYSIS • ADVICE



### **Comparing DSPs and GPPs**

**Caches: Challenges** 

Caches work by lowering average access time

- They are effective at doing this in many applications
- But access times vary significantly

Some applications are sensitive to *maximum* access time

 E.g., many "hard-real-time" signal processing applications

Signal processing access patterns are often predictable

- Thus, DMA may be preferable to a cache
- Some caches provide pre-fetching capability
- Some DSPs' caches can be locked or configured as part cache, part SRAM

INSIGHT • ANALYSIS • ADVICE



### **Comparing DSPs and GPPs**

**Branch Prediction: Strengths and Weaknesses** 

In many applications, branch prediction is very accurate

 This includes signal processing applications, where most branches are part of for-next loops

Complex branch prediction algorithms introduce timing uncertainty

 It can be difficult to predict whether the prediction will be correct at any given instant

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE
ON SIGNAL PROCESSING TECHNOLOGY

47





#### **Comparing DSPs and GPPs**

Trade-offs: Superscalar vs. VLIW

Superscalar (high-performance GPPs, mostly)

- Increased hardware complexity
  - · Silicon area, power consumption
- Dynamic behavior
  - · Complex performance model, timing variability
- Increased performance with binary compatibility
- Decreased software complexity (programmer/compiler)

VLIW (high-performance DSPs, mostly)

- Decreased hardware complexity
- · No dynamic behavior
- Binary compatibility difficult (downward direction)
- Increased software complexity

© 2008 BDT

INSIGHT • ANALYSIS • ADVICE

### **Comparing DSPs and GPPs**Data Path

#### High-performance DSP

Up to 8 arithmetic units Some specialized arithmetic units

• E.g., MAC unit, Viterbi unit

Support multiple data sizes
Limited to excellent bitmanipulation capabilities
Hardware support for managing
fixed-point numeric fidelity

#### High-performance GPP

1-3 arithmetic units

General-purpose arithmetic units

 E.g., integer unit, floatingpoint unit

Support multiple data sizes

May have superior bitmanipulation capabilities

Saturation, rounding typically take extra cycles

a 2008 BDTI

INSIGHT • ANALYSIS • ADVICE ON SIGNAL PROCESSING TECHNOLOG

50









© 2008 BDTI