Massively Parallel Processors for DSP, Part 1

In the last few years a number of start-up companies have announced massively parallel processors for embedded DSP applications. With their arrays of processing elements, these processors target high-end digital video, software-defined radio and other computationally demanding applications for which traditional DSP processors lack sufficient horsepower and ASICs are too inflexible or too costly to design. In some cases, massively parallel architectures are employed to reduce power consumption; if the chip has many parallel resources, it can potentially accomplish the same work at a lower clock speed and burn less power.

Chips that have a few processors on them have been widely deployed for many years, but what’s new is the growing number of chips that contain tens—or even hundreds—of processing elements. There are significant differences among massively parallel chips, but because the technology is relatively new, there isn’t yet a clear taxonomy. Without one, it can be difficult to figure out how to compare these chips to each other and understand potential strengths and weaknesses.

In this article, which is Part I of a two-part series, we’ll explain the key technology differentiators among the latest massively parallel chips. (See our earlier article for a discussion of mainstream DSP processors.) In Part II we’ll take a look at new development tools and methodologies that vendors are using to try to make their chips easier to use.

Four Dimensions of Differentiation

For the purposes of this article, we’ll define four key dimensions of differentiation for massively parallel processor architectures: the granularity of the processing elements; whether the processing elements are homogeneous or heterogeneous; the method used to control the processing elements; and the method used to partition and distribute tasks across processing elements. Understanding where a given architecture fits in these dimensions of differentiation provides a framework for comparing the widely disparate massively parallel architectures available today.

Granularity

The first differentiator we’ll discuss is the granularity of the processing elements. Some chips contain arrays of complete processor cores, while others have lower-level elements, like ALUs. Of the chips that use arrays of processors, some are based on complex, possibly VLIW-based CPUs, while others use very simple processor cores. Finer-grained processing elements are generally more flexible, but may require a more hardware-oriented programming approach (e.g., use of HDL versus a high-level programming language).

Granularity

Add new comment