By now, most people who work with processors—whether in data centers, PCs, mobile devices, or embedded systems—understand that parallel processing is the way to get both high compute performance and good energy efficiency for most applications. And most of these people also realize that programming parallel processors is challenging. There are many different types of parallel processors, including CPUs with single-instruction/multiple data capabilities, multi-core CPUs, DSPs, GPUs and FPGAs, among others.
Historically, my colleagues and I at BDTI have focused mainly on extracting maximum performance from CPUs and DSPs, often by writing hand-optimized assembly code. The advantage of assembly code is that it gives complete control over the processor—selection and scheduling of instructions, selection of registers, etc.—and this enables a highly skilled programmer to get the most out of the processor. A key disadvantage is that assembly coding is very low-level programming, and thus requires intimate knowledge of the processor, and a lot of time. Another key disadvantage is that assembly languages are different from processor to processor, so assembly code written for one processor is typically useless for a different processor.
Many high-level (at least, higher than assembly language) parallel programming languages have been developed over the years, but none have gained truly widespread use. In part, this is due to the reality that parallel programming is hard, even with a good high-level language. But it’s also due in part to the fact that many parallel programming languages have been developed with a particular processor architecture in mind, making them awkward or impossible to use with other types of processors.
As I wrote in 2012, OpenCL is a parallel programming language that was developed for GPUs. It originally gained adoption for developers looking to accelerate PC and data center applications, but more recently, OpenCL has attracted interest in mobile and embedded devices. Suppliers of GPUs for mobile application processors are investing heavily to enable OpenCL on these GPUs. But OpenCL support is not limited to GPUs; suppliers of CPUs, DSPs, FPGAs, and other types of specialized processors are also implementing OpenCL. These developments give me hope that with OpenCL we will finally have a practical parallel programming language that enables portability across different processors of the same type (e.g., Brand A GPU to Brand B GPU or 4-core CPU to 8-core CPU) and also across processors of different types (e.g., CPU to GPU, or GPU to FPGA).
This could not come at a better time for embedded vision applications. (For readers new to this column, we use the term "embedded vision" to refer to the practical implementation of computer vision in mobile devices, PCs, embedded systems and the cloud.) We are seeing a very rapid proliferation of embedded vision for applications like augmented reality, automotive safety and security. These applications have extremely diverse requirements; for example, some do all or most of their processing in the cloud, while others do all of their processing locally, under severe power consumption constraints. Given these diverse requirements, it shouldn’t be surprising that we’re seeing many types of processor used for embedded vision, from multi-core CPUs to FPGAs.
Vision applications typically contain a lot of parallelism, so optimizing them on parallel processors is a natural direction for developers looking to deliver the most functionality for the least power consumption. Particularly in the early days of these new applications, it’s difficult to know which type of processor will ultimately be best for each application. So, having the flexibility to migrate software from one processor type to another—without having to start over each time—can be extremely valuable. At the moment, OpenCL looks to me like the most promising way to enable this portability.