MIPS Announces High-Performance Superscalar Core

Submitted by BDTI on Wed, 06/20/2007 - 19:00

MIPS has introduced the MIPS 74K, a new, high-performance synthesizable general-purpose microprocessor core. The 74K targets demanding multimedia and networking applications, such as H.264 and WiMaX, and according to MIPS, the core has already been shipped to initial licensees.

The 74K is a 32-bit, dual-issue, asymmetric superscalar architecture that supports out-of-order instruction execution and uses a 17-stage pipeline. According to MIPS, the 74K can achieve speeds of up to 1 GHz when synthesized in a 65 nm process—without the use of structured or hard IP. This clock speed is higher than the synthesized core speed of a key competitor—ARM’s Cortex-A8—but as we discuss below, clock speed doesn’t necessarily predict performance.

MIPS says that the core-plus-cache area of the 74K varies from 2.1 to 2.5 mm2 (in 65 nm) depending on whether the synthesis is optimized for speed or size. The 74K core architecture is shown in Figure 1.

Mips

Figure 1: MIPS 74K Block Diagram.

The 74K can re-order instructions so that they execute in a different order than they appear in the program.  MIPS claims that this support for out-of-order execution improves the core’s performance by allowing instructions to execute in parallel more often than they could with in-order execution—including existing object code that has been compiled for earlier MIPS cores.  Out-of-order execution is common in desktop CPUs, but is unusual in embedded processors. Where desktop CPUs might look at roughly 100 instructions before deciding on how to reorder them, the 74K limits its reordering window to 8 instructions. MIPS believes that this limitation represents a good tradeoff between performance and complexity.

The 74K supports the same DSP-oriented instruction set extensions as the 24Ke, plus a number of new application-oriented instructions. For example, both cores can execute two 16x16-bit fixed-point multiply-accumulates, but the 74K can also perform 16x16 integer multiplies.  The 74K is binary compatible with previous MIPS cores; existing MIPS object code doesn’t need to be recompiled to run on the new core, though rework will be needed to take advantage of the new DSP instructions.

According to MIPS, the 74K compiler doesn’t currently use the DSP ASE instructions. Instead, the compiler supports intrinsics that invoke DSP ASE instructions, and MIPS provides optimized DSP library functions including filters, transforms, vector math, and several algorithms used in H.264.  Intrinsics and library functions are likely to be sufficient for many applications. However, if a programmer needs to implement proprietary signal processing algorithms or meet very aggressive performance targets (as may be the case in, say, a handheld audio player), hand-optimized assembly code may be required. Developing such code won’t be easy; while the 74K isn’t as complicated as a desktop CPU, its dynamic features (including out-of-order instruction scheduling) and deep pipeline will make it challenging for programmers to understand code timing and implement effective optimizations. 

As usual, the new MIPS core will be competing with ARM cores, particularly the Cortex-A8, which is also a superscalar architecture and targets similar applications. The Cortex-A8 supports the NEON signal processing instruction-set extensions and, according to ARM, it is projected to operate at up to 1 GHz in a 65 nm process using custom layout techniques; the clock speed of a synthesized core will be closer to 850 MHz. ARM lists the area of the Cortex-A8 plus cache as “< 4mm2 “ (excluding Neon), making it much bigger than the 74K.  Unlike the 74K, however, the Cortex-A8 is already in silicon; Texas Instruments is currently offering Cortex-A8-based OMAP3 chips.

It’s tempting to try to compare the DSP performance of the 74K to that of the Cortex-A8 by comparing their clock speeds. That would be a mistake, though, in part because the Neon DSP-oriented instruction set extensions are more powerful than the 74K’s DSP ASE instructions. For example, Neon supports 4-way, 16-bit SIMD operations (the 74K supports only 2-way) and is able to execute four 16-bit multiplies in parallel rather than two. 

MIPS states that, in a given process, the 74K will provide a 60% speedup on DSP code relative to the 24KE, with about 30% coming from the higher frequency and 30% coming from architectural enhancements. This speedup will be attractive to MIPS users looking to upgrade. BDTI believes, however, that the 74K’s DSP performance may lag that of the Cortex-A8.  BDTI has not yet benchmarked the 74K, but based on an analysis of  BDTI’s benchmark results for the Cortex-A8 and 24KE and using MIPS’ 60% speedup figure, it appears that an 850 MHz Cortex-A8 will be about 75% faster than a 1 GHz 74K on typical DSP tasks.*

Of course, speed isn’t the only consideration in choosing a core; in many embedded applications, it isn’t even the most important one. The 74K core is much smaller than the Cortex-A8, and it may have better performance per area. In some applications the 74K will be fast enough to eliminate the need for a separate DSP processor, which can significantly simplify system design and implementation.

This analysis is based on the assumption that a 1 GHz 74K is 60% faster than a 750 MHz 24KE. 

Add new comment

Log in to post comments