New Synopsys Processor Core Targets Traditional- and Deep Learning-based Embedded Vision

Submitted by BDTI on Mon, 07/04/2016 - 22:03

In early 2015, Synopsys' DesignWare EV5x processor core family achieved notable attention for its unique co-processor engine focused on CNNs (convolutional neural networks) for object recognition and other vision functions. The company's new EV6x processor core family includes an upgraded CNN engine that delivers substantial performance gains over its predecessor while – in a nod to customers preferring to leverage "classical" computer vision algorithms – decoupling it from the remainder of the core, which now includes 512-bit vector DSPs (Figure 1).

Figure 1. Synopsys' new DesignWare EV6x family (top) comes in three variants and includes a 512-bit vector DSP engine, while making the CNN engine an option (bottom), in contrast with its EV5x predecessor.

EV6x family members include a one- to four-core "Vision CPU," which finds use both for control functions and for image pre-processing operations such as greyscale conversion, according to Senior Product Marketing Manager Mike Thompson. The Vision CPU cores start with the same 32-bit scalar processor as in the prior generation EV5x, and add a new 512-bit vector DSP engine capable of 155 GOPS peak throughput at a clock speed of 800 MHz, significantly boosting the per-core performance (Figure 2). The vector DSP handles 8-, 16- and 32-bit fixed-point data types. Also carried over from the EV5x is an optional IEEE-compliant floating-point coprocessor that supports both single- and double-precision operations.

Figure 2. Newly added to each Vision CPU core is a 512-bit vector DSP which Synopsys touts as being capable of 155 GOPS peak performance at 800 MHz, translating to 620 GOPS max for the quad-core EV64 variant.

Alongside the one- to four-core Vision CPU cluster, and capable of operating in parallel, is a second-generation CNN engine (Figure 3). Various unspecified enhancements, garnered from both improved design efficiency and customer feedback on the first-generation CNN core, translate into performance up to 800 MACs/cycle versus the previous CNN core's 64 MACs/cycle peak capabilities. Supported data types are 8-, 16- and 32-bit; 10-bit and 12-bit support is under evaluation, with the smaller data sizes aimed at the sort of the low-precision, high-accuracy CNN designs being created by research teams at Google, Stanford and elsewhere.

Figure 3. The second-generation CNN engine delivers significantly higher MACs/cycle peak throughput and can be excluded for designs that aren't deep learning-based.

Low-precision data also has area efficiency advantages; according to Synopsys' Thompson, reducing the MACs from 16- to 12-bit cuts the required silicon real estate roughly in half. Yet, unlike with the EV5x family, where the CNN core was integral to the architecture, Synopsys has chosen this time to make it an option. This decision was based on feedback from customers who plan to leverage conventional (versus deep learning-based) vision functions and don’t want to spend any additional silicon area on the unused CNN feature.

In a recent briefing, Thompson provided the following preliminary area and power consumption estimates for several representative EV6x proliferations. In each case, the optional FPU is not included. All implementations are based on a 28 nm process:

  • 500 MHz EV61 (single core) without CNN engine, but including 128KB of local memory plus cache: 1.5 mm2, 220 mW.
  • 500 MHz EV62 (dual core) with CNN engine, including approximately 512 KB of local memory (300 KB of it for CNN): 5 mm2, 800 mW.
  • 500 MHz EV64 (quad core) with CNN engine and full memory allotment (unspecified size): 8.2 mm2, 1.3W.

On a 16 nm process, according to Thompson, area and power consumption would both be cut roughly in half in each case.

Synopsys was an early and remains an avid advocate of the Khronos Group's OpenVX open standard API for cross-platform acceleration of computer vision applications, as a glance at the company's planned tool set suite will make obvious (Figure 4). For more information on OpenVX and Synopsys’ support of it, see the recent Embedded Vision Summit presentation from R&D Director Pierre Paulin, a preview version of which is below (Video 1). The EV6x core family won't be available for general licensing until October, although initial licensee engagements with partners such as Faraday and Inuitive are already underway. Already available, however, is the MetaWare Development Toolkit's EV SDK Option, which includes architecture-optimized OpenCV and OpenVX libraries, an OpenVX runtime framework and an OpenCL C compiler.

Figure 4. EV6x-optimized OpenCV and OpenVX libraries and associated compilers and runtimes, along with CNN-related utilities, feature prominently in Synopsys' tool suite plans.

Video 1. Synopsys R&D Director Pierre Paulin delivers the presentation "Programming Embedded Vision Processors Using OpenVX" at the May 2016 Embedded Vision Summit.

Add new comment

Log in to post comments