Calypto's Catapult 8 HLS: C-Based Hardware Design Matures

Submitted by BDTI on Wed, 12/17/2014 - 22:03

High-level synthesis (HLS) is a trend that BDTI has been following for quite some time, beginning with its several-decade-old academic foundations. Most commercial high-level synthesis efforts have focused on C-based design (whether ANSI C, C++ or SystemC), with toolsets that provided for translation of functionality defined at the behavioral level into logic at the register transfer level (RTL). In 2010, BDTI published certified HLS tool evaluation results for AutoESL's AutoPilot and Synopsys' Synphony C Compiler (previously known as Synfora PICO).  High-level synthesis is particularly relevant to digital signal processing designs where the focus is on implementing algorithms in dedicated logic.

HLS industry maturation has led to predictable consolidation. Shortly after the publication of BDTI’s Synfora's PICO certification results, for example, Synopsys acquired the company's technology, engineering resources and other assets. And just a few months later, Xilinx bought AutoESL; the fruits of that purchase can be found in the HLS facilities integrated into the System Edition of Xilinx's Vivado FPGA toolset. Earlier this year, Cadence bought Forte Design Systems, another provider of SystemC-based HLS and arithmetic IP. And three years ago, Calypto Design Systems acquired Mentor Graphics' Catapult HLS technology and development team; as part of the transaction, Mentor took an investment stake in Calypto.

Calypto had been a high-level language (HLL)-focused supplier since its 2002 founding; the company released the SLEC (Sequential Logic Equivalence Checking) RTL verification tool in 2005, followed by the PowerPro power consumption optimization product one year later. Catapult HLS, originally unveiled by Mentor Graphics in 2004, has to date been used on "thousands of tapeouts," according to Calypto's HLS Product Marketing Manager, Bryan Bowyer, and Vice President of Marketing, Mark Milligan. The 2011 acquisition of Catapult HLS, noted Bowyer and Milligan, enables the company to offer a full HLL product portfolio to its customers, currently more than 130 in number.

Recently introduced Catapult HLS v8 (PDF) addresses two fundamental industry trends, one economic and the other driven by chip power consumption. Decreasing per-transistor cost improvements at advanced lithographies are compelling chip designers to remain at the 28 and 20 nanometer process nodes longer than previous nodes; in the absence of "automatic" process-driven improvements in silicon area, performance and power consumption, other means of accomplishing these objectives must be harnessed (Figure 1). And, for those SoC designs that do harness FinFET-based 14 nm processes, dynamic power draw is of increasing concern. While the leakage current for a unit of silicon area decreases due to FinFET's standby power stinginess, the increased number of transistors contained within that silicon area drives up dynamic current draw versus 28 nm and 20 nm processes.



Figure 1. Designers are no longer able to count on steadily decreasing transistor costs with each lithography advancement, extending the viability of today's mainstream 28 and 20 nm process nodes (top, courtesy International Business Strategies). And when integration needs compel a migration to 14 nm, dynamic power draw cannot be ignored (bottom, courtesy Cavium Networks at SNUG 2013).

Development of Catapult 8 began in 2011. Although it was just announced, Calypto has been migrating its existing customer base from previous Catapult HLS versions to "limited access" v8 beta builds (in parallel with continuing to release incremental improvements to previous versions) over the past year. The migration effort is nearly complete at this point, say Bowyer and Milligan, and the year-long "limited access" program has produced a solid v8 production release. Calypto views Catapult v8 as a third-generation HLS offering, with VHDL- and Verilog-based first-generation products having appeared in the late 1990s. Since they offered only a limited rise in the design abstraction level, their perceived value (and their market impact) was muted.

Second generation products such as Mentor's Catapult C emerged in the early-to-mid 2000s. They moved to C-based high-level languages, further raising the design abstraction level, and realized big gains in performance, power consumption and silicon efficiency via their top-down optimization algorithms. Further improvements toward the end of that decade improved control logic synthesis results. Third-generation toolsets like Catapult 8 expand beyond traditional top-down approaches to embrace more designer-controlled bottoms-up techniques (Figure 2). The more comprehensive design management and assembly system offered in this generation comprehends a library-based scheme that optionally allows designers to import pre-synthesized VHDL and Verilog IP function blocks. And the supported "divide and conquer" methodology in Catapult 8 enables designers to iteratively synthesize, verify and "lock" discrete function blocks, resulting in an incremental design flow that's not only more natural for legacy RTL developers, but also results in improved predictability through design iterations.



Figure 2. Top-down synthesis using a fixed-hierarchy design produces results that can vary widely with each compilation iteration (top). The increased designer control afforded in Catapult 8 increases predictability and decreases iterative compilation times (bottom).

The results, according to Calypto, speak for themselves (Figure 3). In some cases, a C-based design actually outperforms its handcrafted equivalent in terms of die area perspective. And in the cases that Calypto showcases, it can be developed much faster than was previously possible using a lower-abstraction-level approach. Speaking of C, it's important to note that with Catapult 8, Calypto treats SystemC as an equal to C++, which was the primary language supported in previous Catapult releases. "The language wars are over; both C++ and SystemC are great for HLS," say Bowyer and Milligan, hearkening back to the unresolved VHDL-versus-Verilog battles of days past.


Figure 3. Modern HLS toolsets can in some cases produce area-consumption results that are superior to those of handcrafted design alternatives. But even if the HLS-derived design ends up a bit larger, you'll be able to get it done much quicker.

Calypto reports that it has also more fully rolled its longstanding verification and power optimization expertise into the new Catapult release. Catapult 8 has been fundamentally re-architected for verification closure, according to Bowyer and Milligan, in partnership with leading Catapult customers. It's been engineered to more cleanly integrate into existing verification flows, more meaningfully considering verification coverage during synthesis and subsequently extracting and passing design knowledge to verification tools and users. By enabling designers to explore power consumption-versus-performance-versus-area alternatives by changing design constraints, rather than modifying source code, Catapult enables designers to rapidly converge on an optimal synthesized design for their application needs. And the optional (at additional cost) Catware component library, which focuses on digital signal processing functions such as filters and FFTs and whicih is delivered as both C++ and SystemC source code, can potentially further speed your designs time (Figure 4).


Figure 4. Digital signal processing functions dominate the optional Catware component library, which is supplied in both C++ and SystemC source code forms.

More generally, Catapult 8 benefits from the same tight coupling between algorithms described in software and silicon implementations of those algorithms that's long compelled designers to consider HLS approaches. Bowyer and Milligan specifically highlighted video compression and decoding, high-resolution image processing and advanced communications as digital signal processing applications that have received significant HLS interest. Calypto has had notable success to date with Google and its partners such as Verisilicon; a WebM (VP9) video decoder developed using Catapult HLS, for example, was built in less than 6 months versus a 1 year estimate for a traditional RTL approach, took ~69,000 lines (14 blocks) of C++ versus ~300,000 lines of RTL source, consumed only 0.9 mm2 at 28nm, and ran at 243 MHz. For chip designers focused on complex, algorithm-based hardware blocks, high-level synthesis is worth considering.

Add new comment

Log in or register to post comments