Case Study: Algorithm Optimizations Efficiently Utilize Processor Features

Submitted by BDTI on Wed, 07/22/2015 - 22:00

Your company’s core expertise lies in developing innovative digital signal processing algorithms, not in porting and optimizing those algorithms to any of the dozens of processors that your customers use. But optimized implementations are often critical to enable customers to utilize your algorithms within their processing performance and power budgets. Good compilers are definitely helpful, but they invariably leave significant performance on the table. Engaging the processor experts at BDTI will ensure that your software is thoroughly optimized.

Take this month's case study: an audio algorithm developer with PC implementation expertise was interested in migrating its algorithms to run efficiently on the DSP core in a mobile application processor. In the mobile environment, optimization is essential both because the DSP is shared by multiple tasks, and to maximize battery life. Initial porting by the algorithm company using the DSP vendor's toolset produced sub-par results, so BDTI was brought in to analyze the situation and assist with optimization. BDTI's code profiling and analysis quickly identified modules with the most optimization potential. BDTI then applied its deep experience with the target processor's architecture and tools, as well its knowledge of algorithms, to rapidly achieve significant optimization gains.

In one case, for example, the client was using an inefficient FFT routine. BDTI had an optimized alternative in its own software library. Minor functional modifications enabled this optimized FFT to mate up with the remainder of the client's code, leading to a roughly 3x speed increase for the FFT. Another function, while coded in a C style reasonable for an x86 CPU, was non-optimal for a VLIW DSP. BDTI first rewrote the function to eliminate unnecessary arithmetic operations and memory loads and stores. BDTI then re-coded the routine in a specific C style that acted as a "map" to the desired optimal assembly language implementation, for example by combining instructions that could execute in the same clock cycle. BDTI then used the output of the C compiler to create an even more optimized assembly code version of the function. The final speed-up of this function was even more significant: approximately 20x.

BDTI's experience-driven insights into processor architectures and software optimizations for those architectures enabled the client to quickly deliver memory- and performance-tailored versions of its library of algorithms to new markets and customers. Find out how BDTI can help you ensure that your code achieves its full potential. For more information, contact Jeremy Giddings at +1-925-954-1411 or

Add new comment

Log in or register to post comments