Inside DSP on Digital Video: Processors for video—Know your options

To create a successful digital video product, you need to choose the right processor. Sounds simple—but of course, it isn’t. A big part of the challenge is that there are so many types of processors from which to choose: general-purpose CPUs, FPGAs, DSPs, configurable processors, and fixed-function chips, among others.

A further complication is that digital video is a fast-moving field, with standards that are shifting and evolving. As a result, a processor’s ability to adapt to changes tends to be more important in digital video than in many other applications—but such flexibility usually comes at the cost of reduced efficiency.

Choosing a processor inevitably requires some compromises, but it’s crucial to know how to pick one that won’t compromise the success of your product.

One doesn’t fit all
Digital video technology is used in products ranging from cellular phones to personal video recorders (PVRs). While many video products share some common functionality—for example, most use video compression algorithms to compress and/or decompress video—they also have significant differences. Portable products place a high priority on energy efficiency; line-powered products typically don’t. Products designed for the living room usually have much higher video resolution than those designed for hand-held products.

In short, one processor won’t fit all. Even one type of processor won’t fit all. The key to success in processor selection lies in knowing what’s available, and understanding the strengths and weaknesses of each processor type.

A few of your favorite things
Because there are so many processor options, it isn’t practical to look at all—or even a significant subset—of them in detail. Instead, a hierarchical approach can make the process manageable: Use your most important criteria to weed out unsuitable candidates early on.

Criteria commonly used for making the first cut include:

Speed. Digital video tasks, like many other types of signal processing tasks, place heavy computational loads on processors. Carefully analyze whether a processor has sufficient speed for the target application, preferably using video-oriented benchmarks such as the BDTI Video Benchmarks™.
Price. Chip price is important, but cost per channel or overall system cost may be more important.
Energy efficiency. In most cases, it’s more meaningful to evaluate energy efficiency than power consumption, since energy use governs battery life.
Flexibility. Some classes of processors are more flexible than others and can accommodate late changes in product features or allow field upgrades, such as adding support for a new compression algorithm. In general, however, the more flexible the processor, the less efficient it is in cost and energy use.
Quality of development tools. Whether the processor has tools that are designed to support development of signal processing applications (or better yet, video applications) can have a significant effect on development time, and hence on time to market.
Compatibility with earlier processors. This is typically important if you expect to reuse software from an earlier product.
Vendor roadmap. Does the vendor’s product roadmap line up well with your plans for follow-on products? Will the processor continue to be supported—or upgraded—over the life of your product?
Availability as chip or licensable core. Some processors are sold as packaged off-the-shelf chips, and some as licensable intellectual property—often called licensable cores—for use in building custom chips. Most of the processor categories discussed here include both packaged chips and licensable cores.

As we will show, each class of processor makes different trade-offs in these areas.

The lineup

In this article, we focus on six classes of processors commonly used for digital video: fixed-function engines, application specific standard products (ASSPs), media processors, DSPs, embedded RISC processors, and FPGAs. We cover them in order from the most specialized to the most flexible. We discuss the strengths and weaknesses of each category, and examine a specific product within that category.

Fixed-function engines use hardwired processing structures for maximum efficiency; they do not use an instruction stream and are not programmable. Hard-wired logic sacrifices flexibility in exchange for exceptional processing speed, energy efficiency, and—often—cost effectiveness.

Using fixed-function engines can simplify system design and testing. Because fixed-function engines are not programmable, product developers don’t have to learn new programming tools or deal with integrating multiple software modules. And they don’t have to figure out whether multiple tasks executing on the processor may interact in undesirable ways, interfering with the real-time behavior of the system.

Fixed-function engines are typically provided as licensable intellectual property (IP) for integration into custom chips. In this form, a fixed-function engine is best suited to high-volume applications such as cellular phones. Fixed-function engines are sometimes provided as chips. A fixed-function video chip—such as an MPEG-2 decoder chip—can be a cost-effective way to add functionality to an existing product, particularly when the product has a host processor that can handle the required control and user interface functions.

Figure 1 shows Hantro’s 5150 MPEG-4 video decoder, an example of a fixed-function video engine sold in IP form. As illustrated in the figure, the engine is intended to be used as a coprocessor, attached to a general-purpose processor that handles some of the less demanding sub-tasks required for MPEG-4 decoding.

Hantro 5150 MPEG-4 video decoder fixed-function engine.

The key drawback of fixed-function hardware is its lack of flexibility. Since it is not programmable, product developers cannot easily modify fixed-function hardware to support new standards or different features. This is a critical concern because many video applications are still relatively immature, with rapidly changing standards and features.

Fixed-function engines are often used as components of application-specific standard products, which we discuss next.

Application-specific standard products (ASSPs) are highly integrated application-specific chips. In contrast to application-specific integrated circuits (ASICs), which are designed by a system house for use in its own products, ASSPs are designed by chip companies and offered as off-the-shelf chips to multiple system developers. Since developing a complex chip is expensive and time-consuming, ASSPs are typically available only for well-established applications where high volumes already exist, or are anticipated.

The Zoran Vaddis 5R, shown in Figure 2, is a highly specialized chip targeting audio and video processing in a DVD recorder. The key algorithms required are well defined: most notably, MPEG-2 video compression and decompression.

Though the Vaddis 5R includes two RISC processors, it uses fixed-function hardware accelerators for the most compute-intensive tasks, like MPEG-2 video decoding and color space conversion. For that reason, the Vaddis 5R (and other ASSPs like it) shares the strengths and weaknesses associated with fixed-function engines: good performance and energy efficiency, but limited flexibility.

This limited flexibility means system designers have limited opportunity to differentiate a product from other products that use the same ASSP. It also means that system designers are highly dependent on the chip vendor’s roadmap, since a new chip will be required to support significantly different functionality—as might be needed in a follow-on product.

ASSPs that rely primarily on programmable processors for computationally intensive tasks sacrifice energy and cost efficiency to gain flexibility. ASSPs of this type are generally bundled with key software components such as video decoders and device drivers, freeing the system developer from much of the low-level software development work. Nevertheless, software development and integration can require significant effort in comparison to that required when using ASSPs based on fixed-function hardware.

Media processors lie between ASSPs and digital signal processors (DSPs) on the specialization-vs.-flexibility continuum. Media processors are optimized for tasks associated with audio and video processing, rather than for a broad range of signal processing tasks, as DSPs are. Media processors are typically heterogeneous multiprocessors, incorporating a main processing engine similar to a DSP, plus two or three specialized coprocessors, and audio- and video-specific peripherals.

Figure 3 illustrates an example media processor, the Philips PNX1500. Typical of media processors, the PNX1500 is based on a powerful, highly parallel processor core that is efficient at video processing tasks. Also typical of media processors, the PNX1500 includes a few fixed-function hardware accelerators and specialized peripheral devices. The main processor core, which is programmable by the system designer, handles complex video tasks like compression.
Philips PNX1500

Like the Zoran Vaddis 5R, the PNX1500 is well suited to MPEG-2 decoding. But unlike the Zoran ASSP, the PNX1500 is flexible enough to be used with other video compression standards such as H.264.This flexibility comes at a price, of course: a software-based video decoder is generally less energy- and cost-efficient than fixed-function hardware.

The heterogeneous multiprocessor nature of media processors makes software development more difficult compared to other programmable processors. For example, to implement a given video task, it is typically necessary to program two or more processing elements and coordinate their interactions. To help address this disadvantage, media processor vendors often provide optimized software component libraries.

Media processor vendors typically stress the use of C or C++ for software development and don’t recommend—or support—assembly language. The focus on high-level language software development is intended to insulate the programmer from many of the complexities of the processor architecture. The downside is that the programmer must rely on the compiler to generate efficient code—and this isn’t always realistic. Developers may need to invest considerable effort hand tuning their high-level language code for best performance.

Digital signal processors, or DSPs, are designed for a range of signal processing applications. DSPs typically employ less video-oriented specialization and parallelism than media processors. To compensate for their lower parallelism, DSPs typically must operate at higher instruction rates than media processors for a given application. Higher instruction rates can complicate system design and increase energy consumption. On the other hand, DSPs require lower clock speeds than embedded RISC processors (discussed below) to handle video tasks. Key advantages for DSPs are their flexibility and strong application development tools. Figure 4 shows an example video-oriented DSP, the Texas Instruments TMS320DM642.

Historically, DSPs have been poor compiler targets, and DSP compilers have been inefficient. But recent years have brought a trend toward developing more compiler-friendly DSPs. Also, some DSP vendors and independent tool providers have invested heavily in developing compilers. As a result, DSP compiler quality has risen dramatically.Yet4, obtaining maximum performance often requires hand-optimized assembly code. The good news is that DSP vendors often provide good assembly language programming tools. But the architectures themselves are sometimes complex, making assembly programming challenging. Because video applications are an important target of DSPs, DSP development tools often have features that aid developers of video applications. For example, data visualization capabilities can be valuable when debugging video processing software.

An important difference between typical DSPs and typical embedded RISC processors is support for operating systems. DSPs typically support a small number of real-time OSes, but do not support “full-featured” OSes like Windows CE. Consequently, many system designs use a DSP to handle video processing and an embedded RISC processor to run the OS and handle other non-video tasks. Recently, however, some DSP vendors have made sophisticated operating systems such as Linux available on their processors.

Historically, DSP vendors have not put a priority on maintaining compatibility from generation to generation. This makes it harder to re-use application software when moving from one processor generation to the next. This is changing, however, with several new DSPs offering some level of compatibility with their predecessors. For example, the TMS320C64x is binary compatible with its predecessor, the TMS320C62x.

Embedded RISC processors are popular for a wide range of embedded applications. Historically, they have been general-purpose machines with few or no application-specific features. RISC processors are often found in the host processor role in video products, typically alongside a specialized video processor.

Until recently, RISC processors were only fast enough to handle very low-end video processing tasks. Today, however, increasing clock speeds are enabling embedded RISC processors to take on more demanding digital video workloads. In addition, embedded RISC processors are gaining increased parallelism and adding specialized video features. While demanding tasks like high-resolution video compression remain beyond the capabilities of embedded RISC CPUs, these processors are increasingly pressed into service for less-demanding video tasks. Figure 5 shows an example embedded RISC processor, the Intel XScale PXA27x.

The PXA27x is based on Intel’s XScale core, which itself is based on the popular ARM v5TE instruction set. The PXA27x adds DSP enhancements to the ARM instruction set via its Wireless MMX extensions. Its maximum clock speed of 624 MHz is relatively high for an embedded RISC processor. In combination with its DSP enhancements, this clock speed makes the PXA27x a capable performer in a number of video processing tasks.

Although often less efficient at video tasks than other types of processors, embedded RISC processors enjoy a number of advantages in the realm of application software development. For example, embedded RISC processors are often backed by a sophisticated software development infrastructure and legions of programmers. And embedded RISC processors are generally easier to program than the other processor classes discussed here. On the downside, the tools and software development infrastructure for embedded RISC processors typically offer less support for video processing software development than do the tools and infrastructure provided for many other types of processors discussed here.

Roadmaps for embedded RISC architectures are generally clearer than the roadmaps of the other processor classes discussed here, simplifying planning for system developers who are contemplating multiple generations of products. And backwards compatibility is almost always maintained. Another advantage of many RISC CPU architectures is multivendor support—that is, multiple vendors offer chips based on the same core architecture. Unfortunately, advanced features, like the Wireless MMX extensions, are often limited to one vendor.

Field-programmable gate arrays (FPGAs) might not be the first thing that comes to mind when thinking about a video processor, but their flexibility and high parallelism (and thus, potentially, high speed) can be a great match for tough video processing applications.

An FPGA contains an array of reconfigurable logic blocks, programmable interconnect resources, I/O blocks, and (in some cases) specialized fixed-function blocks.

FPGAs like Altera’s Stratix-II can be configured to match the requirements of an application and can provide massive computational power and memory bandwidth. Stratix-II is a high-end FPGA family that includes specialized fixed-function blocks, such as multipliers, PLLs, and memory blocks—all of which can boost its performance in video processing algorithms. Figure 6 shows an MPEG-2 decoder implementation on the Stratix-II EP2S15.

A block diagram of an MPEG-2 video decoder implementation (left)on an Altera Stratix-II EP2515 (right).

FPGAs are the most flexible processor type, and FPGA-based designs can be readily upgraded to implement new features or adapt to emerging standards. Unfortunately, this flexibility comes at the price of reduced energy efficiency and cost efficiency. For example, FPGAs are typically far less energy efficient than ASICs or ASSPs, and FPGAs can cost hundreds or even thousands of dollars apiece. FPGA vendors, however, have recently introduced more cost-effective devices, making them attractive for a broader range of applications.

Another downside to FPGAs is that the application development effort is much higher than that associated with programmable processor software development, and fewer engineers are skilled in FPGA design than in software development.

While an FPGA can be a good match for video algorithms, a programmable processor is usually still needed to run things like an OS. For this reason, FPGAs are typically used in conjunction with one or more programmable processors. However, with the advent of “soft” processor cores designed for implementation within an FPGA, like Altera’s Nios II and Xilinx’s MicroBlaze (both 32-bit RISC processor cores), instruction set processors can now be incorporated within an FPGA.

Alternatives

In addition to the six categories of processors discussed above, there are at least four other processor types that may be suitable for some digital video applications. These include the following:

Embedded PC CPUs are general-purpose processors, and thus have few (if any) features that are specifically designed for video processing. Vendors often recycle older, PC-oriented architectures and add more on-chip integration to create variants specifically designed for embedded applications. These embedded PC CPUs are generally unsuitable for heavy-duty video processing, and so they are frequently coupled with a specialized “video” processor that handles the core video processing tasks.
Configurable processors are licensable processor cores that can be customized by the core licensee for use in custom chips. The customization process takes place before the chip is fabricated; once fabricated, the processor hardware is fixed.
Reconfigurable processors are similar to configurable processors, except that they can be reconfigured for different tasks after the chip is fabricated, thus allowing different configurations to be selected at run-time.
Application-specific instruction processors (ASIPs) are processors that are custom designed for the application at hand. ASIPs are not sold as packaged processors or as licensable processor cores; instead, vendors offer tools that enable chip designers to create their own ASIPs.

Because digital video is such a hot market, expect to see even more processor types introduced in the coming years. These will probably combine elements of the types of processors we’ve discussed here, and their tradeoffs will reflect those of the constituent approaches.

Hedging bets
Clearly, no single processor or processor type is best for all digital video applications. Classes of processors that offer some flexibility are becoming popular, but fixed-function hardware has its place, too. In part, it’s a question of how much you want to hedge your bets—and you must consider all the solutions.

This article was contributed to by Jennifer Eyre, BDTI.

Inside DSP on Digital Video: Processors for video—Know your options

Add new comment