Case Study: Shoehorning Maximum Signal Processing into Minimal Processors

Submitted by BDTI on Wed, 03/19/2008 - 17:00

Digital signal processing (DSP) algorithms are increasingly important in embedded systems. For example, compute-intensive multimedia functions are finding their way into applications ranging from toys to appliances to telephones. But in many of these systems, cost constraints dictate a processor with very minimal horsepower and limited—or no—signal-processing-specific features. 

 
A classic example of this kind of processor is the ARM7. This architecture was introduced in 1993, has no DSP capabilities whatsoever, and maxes out at about 150 MHz. The ARM7 is often used as the processing engine in low-cost microcontroller chips, which typically run at clock rates that are much lower than this maximum. Even so, ARM7-based chips are still chosen for new embedded products (including products that require signal processing) because of these chips’ maturity and low cost. So how do you wrangle sufficient DSP performance out of a processor that wasn’t designed for DSP? And how do you do it quickly?
 
In general, to obtain the most efficient code, signal processing software typically must be optimized at four distinct levels. First, the software architecture and data flow must be designed to take maximum advantage of the processor’s resources. On a processor like the ARM7, for example, it’s important to minimize memory bandwidth requirements, since the processor uses a Von Neumann architecture with a single bus set. Second, the appropriate data types must be selected—too big and you’re wasting resources, too small and your system may not work. There’s no sense using 32-bit data if 16 bits gets you what you need. Third, the software must be optimized at the algorithm level—perhaps by combining multiple algorithms into a single processing step, or by substituting one algorithm for another. And last, the chosen algorithms must be mapped into processor instructions in a way that is clean and efficient, minimizing cycle-wasters like pipeline stalls and cache misses. Typically this last step requires assembly-level optimizations to squeeze out every possible cycle. 
 
Obtaining the best result—quickly—requires expertise at all four levels, and a good understanding of where you’re likely to get the best return on your optimization efforts. Unfortunately, engineers sometimes equate “optimization” to “assembly code optimization,” and neglect to assess the potential benefits of higher-level optimizations. This can be a serious error; it’s painful and time-consuming to revisit higher levels of optimization after you’ve already created an optimized assembly implementation, since you may have to toss out your code and start over. And in fact, for very simple processors like the ARM7, assembly language optimization is likely to yield minimal performance improvements compared to what can be achieved by optimizing the software at a higher level. 
 
BDTI recently collaborated with a company that was developing a low-cost consumer product that needed to handle some fairly demanding image processing. Algorithms used for the types of image processing in question are usually very computationally demanding—but for cost reasons, the company had chosen an inexpensive, 81 MHz ARM7-based MCU chip. Achieving the needed functionality on this processor would clearly pose a significant challenge, assuming it could be done at all.
 
Initially, the company asked BDTI to perform a feasibility study and evaluate the likely worst-case performance of commonly used image processing algorithms on the ARM7 chip. The results were not encouraging. It was clear that, even with expert assembly-level optimization, these algorithms would not run with sufficient speed on this chip. However, BDTI was able to propose an alternative solution—a solution that didn’t require a processor upgrade. Based on a thorough knowledge of both the image processing algorithms and the processor, BDTI identified a few key modifications to the algorithms that would greatly reduce the computational load on the ARM7 and bring it within acceptable performance targets. There was a trade-off, of course; the quality of the images would be slightly reduced, but would still be considered extremely good for the class of product being developed.
 
The company liked the approach and asked BDTI to move forward with the project. BDTI was able to modify the algorithms and quickly get the software up and running on the ARM7. Because BDTI had identified an appropriate optimization strategy up-front, it was able to avoid extensive assembly-level optimizations, and the project was completed ahead of schedule. As a result, the company was able to demo the product at an important conference and attract the attention of a large distributor—who subsequently bought the product. BDTI’s expertise in multi-level software optimization was a key ingredient in the product’s success.
 
To learn how BDTI can help ensure that your software hits its performance targets, whether on a high-performance DSP processor or a humble microcontroller, contact Jeremy Giddings at +1 (925) 954 1411 or giddings@BDTI.com.
 

Add new comment

Log in to post comments