Imagination Technologies' New GPUs Tackle Computer Vision Arithmetic

Submitted by BDTI on Mon, 02/08/2016 - 22:03

Just prior to the 2014 Consumer Electronics Show, Imagination Technologies unveiled its first computer vision processor offering with the announcement of the Raptor core architecture, the first product iteration of which was released at the February 2014 Mobile World Congress. Now, the company is more fully embracing computer vision requirements with its two new PowerVR Series7XT Plus GPU cores. And, reading between the lines of a recent briefing, Imagination Technologies continues to seriously evaluate what else it needs to do to fully address customers' vision processing needs.

Raptor is fundamentally an ISP (image signal processor), although it offers a limited set of vision processing features, such as stereo sensor support for depth determination. Its capabilities are extensible by means of its flexible processing pipeline, which other processing resources in a SoC can "tap" into at various points both to extract and insert data. Additional computer vision processing might occur, for example on a MIPS CPU core. And now, it might also take place on a PowerVR graphics core.

The two- and four-cluster Series7XT Plus GPUs are built on the foundation of the Series7XT architecture, one of the two GPU families released by Imagination Technologies at last year's CES. Although computer vision libraries such as OpenCV often rely on floating-point math, reflecting their PC heritage, embedded applications often don't require floating-point and can gain performance, power and cost benefits from instead relying on integer math. In fact, as Cadence learned in its IVP to IVP-EP vision core evolution, full 32-bit integer precision often isn't necessary; many vision applications will happily trade reduced precision for faster results.

The PowerVR Series7XT Plus architecture reflects both of these realities. Its enhancements begin with the new Image Processing Data Master at the front end of the pipeline, which when activated completely disables the core's floating point resources to save power (Figure 1). Also, in past PowerVR core families, 32-point integer operations were inefficiently handled by the 32-bit floating point ALU resources inside each unified shading cluster (USC). Now, however, Imagination Technologies has added two dedicated 32-bit scalar integer ALUs to each pipeline (with 16 pipelines per cluster). And alternatively, each 32-bit scalar ALU can concurrently implement up to two 16-bit vector or four 8-bit vector integer operations.

The PowerVR Series7XT Plus architecture reflects both of these realities. Its enhancements begin with the new Image Processing Data Master block at the front end of the pipeline, which when activated completely disables the core's floating-point resources to save power (Figure 1).


Figure 1. Imagination Technologies has expanded the legacy floating point-centric support of its PowerVR graphics architecture with integer arithmetic capabilities specifically targeting computer vision opportunities.

Also, in past PowerVR core families, 32-point integer operations were inefficiently handled by the 32-bit floating point ALU resources inside each unified shading cluster (USC). Now, however, Imagination Technologies has added two dedicated 32-bit scalar integer ALUs to each pipeline (with 16 pipelines per cluster). And each 32-bit scalar ALU can alternatively implement up to two 16-bit vector or four 8-bit vector integer operations. The INT32, INT16 and INT8 hardware resources are largely shared, not distinct, as a company-supplied cluster block diagram might otherwise imply (Figure 2).


Figure 2. Vector INT8 and INT16 resources are shared with those of the scalar INT32 ALUs, not distinct, as a quick perusal of the cluster block diagram might incorrectly suggest.

When asked about the reason for the blend of 32-bit scalar and 16- and 8-bit vector capabilities implemented in the new integer ALUs, Peter McGuinness, Director of Technology Marketing, noted that this combination was fundamentally fueled by the drive for minimal power consumption, and more generally commented:

The scalar/vector structure is largely opportunistic, in order to efficiently get multiple issues of lower-order INT operations into the INT32 pipeline. Clearly, the performance is heavily dependent on the compiler (where we have a lot of experience, from the old SGX vector architecture) and on the actual data types being streamed through the integer engine. Vec4 [editor note: an OpenGL Shading Language data type] is great for many graphics operations (RGBA and 4-space vectors being the bulk of operations), whereas in vision we rarely if ever have alpha data and there really aren’t any higher order vector operations going on. The saving grace is that image data is typically planar; therefore, adjacent pixels can be grouped into 4-wide or two-wide "vectors", meaning that good occupancy can be achieved. It doesn’t take a lot of thought to see how 4x4 or 2x2 blocks can be very useful in imaging; think, for example, of Sobel operators or optical flow functions.

Other architecture enhancements strongly relate to Imagination Techologies' newly added support for OpenCL 2.0, versus prior more limited support for v1.2 of the heterogeneous processing API from Khronos (Figure 3). They include explicit support for dynamic parallelism between the GPU and CPU, minimizing "handshaking" overhead between the two, along with the ability to share a common virtual memory space between them. Other more general performance improvements in PowerVR Series7XT Plus, which will benefit computer vision and conventional graphics operations alike, include a doubling of the maximum memory-transfer burst size and various cache efficiency enhancements.


Figure 3. Imagination Technologies has improved its cores' OpenCL and other capabilities via feature additions and tweaks.

When asked why similar dedicated integer arithmetic capabilities weren't added to the company's lower-end PowerVR Series7XE graphics cores, which include fewer processing resources along with a reduced feature set (no security, no tesselation, no full-profile OpenCL support, etc.), McGuinness noted that the kinds of systems these low-end GPUs target are usually low-performance, and therefore not ideal candidates for the inclusion of vision processing. And when asked about Imagination Technologies' thoughts regarding dedicated-function vision processing cores (VPUs), running either deep learning or more traditional algorithms, McGuinness responded that he sees two primary applications for them: in systems that are display-less, and therefore do not require a GPU, and in high-end designs for which the tailored processing "muscle" of a VPU was necessary, such as in one of Mobileye's MIPS-based ADAS chips.

McGuiness was understandably reticent about disclosing any future VPU core plans from Imagination Technologies, although he did note that the enhancements found in PowerVR Series7XT Plus would also be carried forward into the company's PowerVR Series8 GPUs, which are currently under development. Nearer term, McGuinness indicated that top-tier customers are currently designing SoCs using both the two- and four-cluster PowerVR Series7XT Plus core variants, with first silicon expected in approximately a year. And, in other notable computer vision news from the company, its earlier PowerVR Series6 GPU cores have recently passed OpenVX vision API conformance with Khronos.

Add new comment

Log in or register to post comments