- Case Study: Deep Understanding of Processor Architectures and Computer Vision Algorithms is Key to a Breakthrough Product
- New AMD Software Library, Hardware Support Deep Learning Acceleration
- Case Study: Making Sure Tools for Deep Learning are User-friendly and Robust
- AMD's ROCm: CUDA Gets Some Competition
- Case Study: Balancing the Demands of Algorithms and the Capabilities of Processors When Designing Computer Vision Systems
Case Study: Multi-Tiered Software Optimization
It’s generally accepted that, for processing engines, there is a trade-off between efficiency and generality. The more a chip is geared towards a specific application, the more efficient it’s likely to be (in terms of speed, energy consumption, and cost). On one end of the spectrum you have traditional FPGAs, which are completely general-purpose, and on the other are fixed-function chips, which are completely application specific. In between these extremes lie various types of processors, including DSPs.
When BDTI recently published an independent benchmarking report showing that high-performance FPGAs are not only much faster, but also more cost-effective than high-performance DSP processors on certain demanding workloads, we heard from a lot of confused engineers. They were convinced that FPGAs couldn’t be more efficient than DSPs. After all, they said, in order to provide reconfigurability, FPGAs spend transistors extravagantly. Implementing a simple logic function (like a two-input NAND) on an FPGA requires a lot more transistors than are used to provide the same function in a processor’s ALU.
They have a point-but efficiency comes in many flavors. Think, for example, about how traditional processors utilize silicon area. A typical high-performance DSP devotes just a tiny fraction of its area to computation; most of it is soaked up by memory and other structures (e.g., buses, DMA engines) that are dedicated to moving data around the chip. An FPGA, in contrast, can use much more of its silicon area for computations. As a consequence, if you run a demanding, highly parallel DSP algorithm on a typical DSP processor you’ll see that only a few small portions of the silicon are consistently active. An FPGA running the same application, in contrast, will be using lots of its resources, most of the time. So which is more efficient?
Generality and efficiency are complex, multi-faceted concepts. It’s easy to focus on one particular weakness of a given approach (such as the extra transistors used by FPGAs, or the fact that most of the transistors in a processor chip are idle most of the time), but in the end what matters is the actual performance, efficiency, and flexibility that a chip delivers for a specific application.