Say you’ve just developed a new digital signal processing algorithm—a new audio or video codec, for example. The algorithm is intended to be ported to multiple embedded processors, including generalpurpose processors (like the ARM9) and digital signal processors (like the Texas Instruments TMS320C55x). Porting an algorithm to an embedded processor is a lot of work, but many of the steps involved are the same regardless of the target processor. Therefore, it makes sense to create a version of the reference code that incorporates all of the processorindependent porting steps; no sense reinventing the wheel for every processor. By making the new version of the reference code “embeddedfriendly”—that is, by adapting the code to the specialized needs and constraints of embedded processors—you’ll make it much faster for software engineers to create optimized, processorspecific implementations. You will also reduce the chance of introducing bugs during the porting process, and will help to transfer knowledge of the algorithm to the software engineers. You may even increase the likelihood that your algorithm will become widely adopted (if that's one of your goals). In this article, we’ll discuss some of the techniques you can use to create a robust, embeddedfriendly version of your signalprocessing algorithm reference code.
The Porting Process
The process for porting a signal processing algorithm to an embedded processor is typically accomplished in four main steps, as shown in Figure 1. The steps are usually discrete, though there may be some iteration between steps.

Figure 1: Porting algorithm reference code to an embedded processor. 
The first step is to develop and test the algorithm in a highlevel language such as MATLAB or C. This version of the algorithm is usually based on floatingpoint math because it’s much easier to develop and test new algorithms when you don’t have to worry about numeric effects.
The second step is to create a fixedpoint version of the reference code. This step is required because most embedded processors are fixedpoint devices. It represents a significant portion of the porting effort.
The third step is to optimize the fixedpoint reference code for embedded targets using processorindependent optimizations, usually in ANSI C. At this point, the code is ready to be compiled on an embedded processor, and should yield a reasonably efficient implementation (compared to what you’d get if you compiled the original algorithm reference code.)
The final step (step four) is to optimize the implementation for maximum speed or efficiency on a specific target processor. Depending on the processor, the tools, and the demands of the application, this may be done using assembly language or processorspecific Clanguage techniques—or both. The optimization effort can range from minimal to massive, but it’s common for signal processing algorithm code to require significant optimization work. For the purposes of this article, we’ll assume that you’ve already developed and tested a floatingpoint version of the algorithm. We’ll focus on steps two and three, providing useful techniques for creating embeddedfriendly reference code that gives software engineers a good starting point for step four.
Add new comment