Case Study: Squeeze Code and Make Space for New Features

Submitted by BDTI on Mon, 04/07/2014 - 22:00

Semiconductor memory is increasing in capacity and becoming more cost-effective all the time. Yet, plenty of deeply embedded applications still exist for which every spare byte of RAM or flash memory is a precious commodity, especially those leveraging on-SoC storage versus discrete components. Tack on a performance-constrained DSP, intentionally speed-hampered to minimize power consumption, and a limited-capacity battery coupled with a multi-day or -week operating life expectation, and you've got a particularly challenging design on your hands.

Recently, a company with just such a project approached BDTI with the request to "scrub" its code for possible resource efficiency gains. The company's design team was hoping to add capabilities to its existing code base, but the software—four sequentially executed sub-processes coordinated by a high-level master process—currently took so long to execute that additional feature augmentation wasn't feasible. The company had already gone through the master process with a fine-toothed comb, and it contracted BDTI to scrutinize the digital signal processing-focused sub-processes.

Code size limitations precluded many standard optimizations such as additional loop unrolling. And an initial perusal of the algorithms, fully coded in assembly language for efficiency, revealed no obvious "gotchas". The company's R&D team (who had originally developed the code) had done a solid job, and "all of the low-hanging fruit had already been picked," in the words of one BDTI engineer assigned to the project. Nevertheless, thanks to a fresh set of eyes and a big bag of tricks, BDTI's staff still made several key improvements.

BDTI's efficiency optimizations concentrated on three levels of abstraction:

  • Algorithm (mathematical)
  • Code structure (architectural), and
  • Low-level coding

For example, by reallocating various functions from one sub-process to another as a load-rebalancing effort, the worst-case sub-process execution delay was reduced by more than 10%. Additional algorithm execution time reductions occurred thanks to various coding improvements (function in-lining to reduce subroutine calls-and-returns and consequent context save-and-restore overhead, for example), each seemingly minor but cumulatively impactful. And BDTI's engineers made further coding suggestions that the company plans to implement when it migrates to a next-generation DSP with added hardware features.

Although in this particular case the client's aspiration was to free up processing headroom for additional planned software capabilities, the optimizations that BDTI delivered have other potential benefits. They might, for example, enable the processing of more incoming data in a given amount of time, increasing an audio algorithm's sampling rate or number of processed channels, for example, or a video algorithm's frame rate or per-frame resolution. Alternatively, they might enable using a slower-clocked CPU and other circuitry, thereby further increasing battery life. If you'd like to harness BDTI's software experts to optimize your code efficiency, contact Jeremy Giddings at +1 925 954 1411 or

Add new comment

Log in to post comments