In early 2010, Xilinx previewed its vision for what it calls an “extensible processing platform”—a highly integrated combination of a high-performance embedded processor subsystem and an FPGA. Earlier this month, that vision came one step closer to reality with Xilinx’s disclosure of details of its first extensible processing platform product family. The family, named Zynq-7000, initially comprises four chips. Xilinx says Zynq samples will become generally available in the first half of 2012.
Zynq is not the first product to integrate a hard-wired processor and an FPGA on a single chip. For example, Xilinx itself currently offers high-end FPGAs integrating a PowerPC core. But Zynq is noteworthy in a number of respects. Perhaps most interesting, Xilinx positions Zynq as being “processor-centric”: when a Zynq chip powers up, it first boots an operating system like a processor, rather than first loading a bit stream to configure the programmable logic like an FPGA. This enables embedded systems designers and software developers to start using Zynq without first learning how to do FPGA design. (Of course, they’ll need to learn FPGA design—or find an FPGA designer—to use the FPGA portion of the chip, but more on that topic later.)
Zynq’s processor-centric personality is enabled by a high degree of integration within the processor subsystem. Zynq does not simply integrate a processor core into the FPGA. Rather, as shown in Figure 1, it integrates a complete processor subsystem comprising the processor core (actually two cores, since the processor is a dual-core Cortex-A9), two levels of cache memory, DDR controller, and an array of I/O interfaces including USB, gigE, I2C, and SPI. So, without utilizing the FPGA portion of the chip at all, a Zynq chip is capable of functioning like a typical high-performance embedded processor chip.
Figure 1. Zynq chip block diagram, emphasizing the CPU subsystem.
Another distinctive aspect of Zynq is Xilinx’s choice of a high-performance CPU. Xilinx expects Zynq users to utilize the FPGA portion of the chip to implement their most performance-intensive functions, offloading the CPU, but the company did not scrimp in its choice of CPU. Each of the cores in the dual-core Cortex-A9 CPU includes support for NEON single-instruction, multiple-data instruction extensions as well as a floating-point unit. At a top speed of 800 MHz, the CPU subsystem will deliver substantial performance.
Xilinx’s choice of an ARM core is timely, since the ARM architecture has developed impressive momentum in a wide range of embedded applications far beyond its original stronghold of mobile phones. By incorporating the Cortex-A9 into Zynq, Xilinx will take advantage of ARM’s momentum with respect to enabling software and tools, including operating systems, device drivers, and middleware such as media players.
A key advantage to having the processor and FPGA on the same piece of silicon is the ability to have a very wide, fast connection between the two. Prior BDTI analyses have shown that FPGAs can be very cost-effective engines for executing compute-intensive digital signal processing functions. But FPGAs are also more difficult to use than processors. As a result, system designers often partition their computing workload, offloading to the FPGA those portions of the workload that won’t fit on the processor, and leaving the rest on the processor. A pitfall with this approach is that the overhead of moving data back and forth between the processor and the FPGA can easily become a performance bottleneck, limiting the performance gain enabled by the FPGA coprocessor. In the Zynq chips, Xilinx has provided more than 3,000 data and control signals connecting the processor subsystem to the FPGA.
Beyond the hardware aspects of communicating between the processor subsystem and the FPGA, though, there is the question of how software running on the CPU will interface with functions implemented in the FPGA, and vice-versa. Today, system designers using separate processor and FPGA chips face a mountain of low-level reinventing-the-wheel work in order to get the two chips working together effectively. For Zynq to realize its potential, Xilinx will need to raise the bar not only on integration of the hardware blocks, but also with respect to the programming model and tools provided for using them in an integrated fashion. So far, Xilinx hasn’t said much about how it intends to tackle this challenge. Xilinx’s recent acquisition of high-level synthesis tool vendor AutoESL could help in this regard, but by itself high-level synthesis is only part of the solution.
Xilinx lists a wide range of target applications for Zynq chips, with emphasis on applications in data- and compute-intensive realms such as smart surveillance cameras, medical imaging systems, routers, and wireless equipment. The initial four family members differ primarily in the size of the FPGA (ranging from approximately 28,000 to 256,000 logic cells and from 80 to 760 DSP blocks) and the availability of high-speed serial transceivers. Regarding pricing, Xilinx says that the smallest Zynq chips be priced below $15 “in high volumes.”
For system designers currently using an FPGA in combination with a CPU, Zynq’s potential appeal is obvious—it’s the usual integration story of better performance with reduced size, power, and cost. For designers currently relying mainly on embedded processors, the potential performance upside of Zynq is likely to be intriguing—and could become compelling if Xilinx can deliver an integrated, software-centric programming model, development methodology, and tool chain.