IBM’s Cell for Embedded?

IBM’s multicore Cell processor has garnered a lot of media attention over the last couple of years, as the multicore approach itself has become something of a juggernaut. BDTI recently investigated the current state of Cell products, and whether the architecture is likely to get significant traction in embedded applications.

The Cell processor incorporates eight “synergistic processing elements” (“SPEs”), each of which is a complex, superscalar processor with SIMD features for accelerating DSP algorithms. These processors are managed by a “POWER Processing Element” (PPE)—a separate superscalar CPU. The PPE is responsible for running the operating system and coordinating the activities of the SPEs. The SPEs have their own instruction sets and run their own programs (or off the-shelf library functions), which are called from a program running on the PPE. Cell was originally designed as a high-performance processor for game consoles, and its top clock speed is about 3 GHz.

According to IBM, the architecture is well suited for a variety of other high-performance computing applications, particularly those that include image processing or real-time signal processing. IBM’s target markets for Cell include video surveillance, medical imaging, digital content distribution (particularly on-the-fly transcoding), aerospace and defense. IBM is also targeting common HPC applications like financial, chemical, and petroleum analyses. Meanwhile, IBM’s Cell partner, Toshiba, is busy taking Cell into consumer devices, such as televisions. (Toshiba recently demoed a Cell-powered flat-panel TV at CES.)

So far, IBM is not offering Cell chips. Instead, it is bringing the architecture to market in three board-level forms: server blades based on an IBM chip set; PCI accelerator boards; and IBM-designed custom boards. Of these, the PCI accelerator and custom boards are most relevant for embedded applications, though some embedded developers may start with a blade-based system and then migrate to a PCI accelerator or custom board.

It may seem surprising that IBM—which is, after all, a chip company—isn’t offering Cell chips. IBM’s perspective here is that the primary value of Cell lies at the system level; by limiting how Cell is distributed (and to whom), IBM can provide better system-level support and offer a more complete solution to its customers, including software tools, libraries, and other infrastructure components. According to IBM, selling chips directly requires a broader support infrastructure, and the company is not yet ready to take that step.

According to IBM, the key benefits that Cell provides in its target markets are very high performance coupled with ease of development. The “ease of development” claim may have many engineers scratching their heads, since there is a perception that the processor is very complicated to program. This is an ongoing challenge for all vendors of multicore and massively parallel devices, and it’s not easily dismissed.

Compared to a single-core processor, Cell is definitely more challenging to program. But that’s probably not the right comparison to make, since Cell offers much higher performance and can tackle applications where single-core chips would be overwhelmed. Compared to a high-performance FPGA, Cell is probably easier to use, and is programmed within a development paradigm that is fairly familiar to engineers who have worked with other, simpler programmable processors. Compared to other massively parallel chips, Cell has a small number of processing elements, but those elements are complex, superscalar processors rather than, for example, the simple ALU arrays used by some massively parallel architectures. There are proponents of both approaches, and not much agreement on which one yields an easier programming model.

The fact is, however, that massively parallel processors are always complex. The choice between a large number of simple processing elements and a small number of complex processors is, in effect, a choice between different types of complexity. Simple processing elements have a limited repertoire of capabilities, and so tend to be straightforward to use—at least, on a per-element basis. But it takes many of them working together to achieve high performance, and that’s where the complexity arises. Furthermore, other architectural considerations (such as whether processing elements run synchronously or asynchronously, whether memories are shared or distributed, etc.) can have a significant impact on a processor’s ease of programmability.

The trick to programming Cell, according to IBM, lies in rethinking how you associate data with instructions and balancing the computational capabilities. Engineers who have worked with AltiVec or other high-performance SIMD-oriented engines will probably have a head start here; creating highly parallel versions of algorithms is a skill that will be essential for Cell users.

Of course, it’s also crucial to have good tools. To address this requirement, IBM has been working with RapidMind (a vendor of parallel programming tools for multi-core processors) and has invested in math and image libraries.

We believe that Cell has the potential to be successful in the embedded market, in large part because of IBM’s vast expertise and resources. The company can use its resources to pursue a multi-pronged strategy: Work closely with a small number of customers to deliver highly customized solutions for applications with big volumes—like game consoles. In parallel, collaborate with a second tier of customers to develop custom boards and associated application software, and provide a more complete solution than might be found from other chip vendors. And while these two market strategies push the processor into a few key products, IBM can continue to invest in tools, software components, training, etc. until Cell’s application development infrastructure is sufficiently well developed to attract a broader range of customers. This could take some time, but IBM is one of the few vendors that can probably afford to be patient. Most massively parallel chip vendors just don’t have that luxury.

Add new comment