Case Study: Making Compilers Smarter

Submitted by BDTI on Wed, 05/28/2008 - 17:00

Fifteen years ago DSP engineers expected to write and optimize most of their software in assembly language, and they did it on DSP processors with obscure and highly specialized instruction sets.  Back then, compilers for DSP processors were inefficient and couldn’t use many of the processors’ specialized performance-improving features. If you wanted to use bit-reversed addressing or circular buffers or fill delay slots, for example, you’d have to write that code yourself. 

Today, most embedded applications are too big to be implemented entirely in assembly language. As a result, if you’re a processor vendor targeting applications that require signal processing (which, at this point, includes nearly all processor vendors and nearly all embedded applications), you must have a competent, DSP-savvy compiler.  It’s not enough to offer a powerful processor architecture; you have to give your customers the means to access that architectural power without requiring them to spend months and months hand-tweaking assembly code.  This is easier said than done, of course, since DSP algorithms are highly parallel and don’t lend themselves well to expression in C.
A large processor company that wanted to evaluate its compiler’s performance on signal processing algorithms recently contacted BDTI.  The company needed a baseline performance evaluation with which to compare future versions of the compiler, and also needed advice on how to modify the compiler so that it would be more efficient on typical signal processing software. 
BDTI used two of its application-oriented benchmarks, the BDTI Video Encoder and Decoder Benchmarks, to evaluate the compiler. These benchmarks include a range of common signal-processing computations and C code representing a variety of coding techniques ranging from computationally intensive tight loops to data dependent control code.  As such, these benchmarks provide a good measure of a compiler’s performance on many types of multimedia software.
BDTI compiled the benchmarks on the processor, then profiled the code to identify the top performance bottlenecks. BDTI hand-optimized those sections in assembly language, and compared the hand-optimized version to the compiled version to identify changes to the compiler that would yield more efficient code. 
By running a variety of tests (using different compiler switches, etc.) and doing extensive comparisons between compiled code and optimal code, BDTI was able to make several recommendations to the processor vendor for improving the compiler’s performance.   These included a number of optimizations related to vectorization, data buffer alignment, and loop optimizations.  Each of these optimizations would result in generated code running around four to five times faster than the original code.  Using this information, the processor vendor was able to plan and prioritize the implementation of optimizations into future versions of the compiler, and use the profile information to track the performance of subsequent compiler versions against that of the original.
To learn how BDTI can help you evaluate and improve your development tools, contact Jeremy Giddings at +1 (925) 954 1411 or giddings@BDTI.com.

Add new comment

Log in to post comments