ARM Takes New Direction With Latest Core

Submitted by BDTI on Mon, 10/10/2005 - 20:00

Last week ARM announced its latest processor core, the Cortex-A8. The Cortex-A8 is the highest-performance ARM processor to date, and it differs in many ways from older ARM cores. Perhaps the most obvious difference is the name: Instead of the traditional naming scheme (ARM7, ARM9, ARM11, etc.), the processor uses the new “Cortex” brand. The “A” in the name indicates that the Cortex-A8 is an “application” processor targeting systems that require a full-featured OS like Linux. Target markets include smart phones, set-top boxes, printers, and automotive infotainment applications.

Despite the change in naming scheme, the Cortex-A8 is the successor to the ARM11, and it contains many of the architectural features found in the ARM11. However, the Cortex-A8 also contains several major new features. Most notably, the Cortex-A8 is the first processor to use the NEON signal processing extensions. These instruction set extensions dramatically boost the processor’s signal processing capabilities. For example, the Cortex-A8 can complete up to four 16-bit multiply-accumulate instructions per cycle. In comparison, the ARM11 can complete up to two 16-bit multiply-accumulate instructions per cycle. (For more on NEON, see October 2004 edition of Inside DSP.)

The Cortex-A8 pipeline is also notably different from the pipelines of other ARM cores. The Cortex-A8 is the first ARM core to use a superscalar pipeline. Specifically, it uses a dual-issue, in-order execution superscalar pipeline. The Cortex-A8 also has an unusually long pipeline: Its main pipeline is 13 stages long, and NEON instructions require an additional 10 pipeline stages. In comparison, the ARM11 pipeline has only 8 stages. According to ARM, the long pipeline will allow the Cortex-A8 to achieve high clock rates—potentially exceeding 1 GHz in a 65 nm process.

It is clear that the NEON extensions, superscalar pipeline, and high clock rates will make the Cortex-A8 much faster than older ARM cores on signal-processing tasks. BDTI has not yet performed an in-depth analysis of the Cortex-A8, but BDTI has benchmarked the ARM1136. BDTI found that the ARM1136 is about as fast as mid-range DSPs, suggesting that the Cortex-A8 is fast enough to compete with all but the fastest DSPs.

The significant boost in speed comes at the cost of a major increase in die area. ARM states that the Cortex-A8 occupies up to 3 mm² when fabricated in a 65 nm process. By comparison, an ARM11 fabricated in a much larger 130 nm process also occupies about 3 mm². If both cores were fabricated in the same process, it is clear that the Cortex-A8 would be much larger.

The Cortex-A8 also faces hurdles in the area of development infrastructure. For example, system designers often select a processor based on the availability of off-the-shelf software, and the Cortex-A8 falls short in this area. The NEON extensions use completely new instructions, so existing software will have to be reworked to take advantage of these extensions. This puts the Cortex-A8 at a disadvantage against popular DSPs that have good software support.

It is likely that ARM’s third-party partners will eventually offer extensive signal-processing software for the Cortex-A8. However, it may take some time before these third-party partners build up these libraries. The Cortex-A8 is far more complicated than earlier ARM cores. As a result, writing optimized code for the Cortex-A8 is likely to be much more difficult—and therefore more time-consuming—than optimizing code for earlier ARM cores. In recognition of this fact, ARM plans to offer a vectorizing compiler for the Cortex-A8 by early 2007. If this compiler can create efficient code, the Cortex-A8 will have a significant advantage over its competitors. However, creating a vectorizing compiler for an architecture with a deep, superscalar pipeline and a complex instruction set is likely to be difficult, and there is no guarantee that ARM will succeed.

It is also notable that ARM is attempting to push the Cortex-A8 beyond ARM’s traditional markets such as mobile phones. Many of the target markets—such as set-top boxes—are dominated by high-performance general-purpose processors including MIPS, PowerPC, and x86 processors. In such markets, competing general-purpose processors are likely to have much better support in areas such as off-the-shelf software and reference designs.

In summary, the Cortex-A8 gives the ARM architecture a huge leap in speed, particularly for signal-processing applications. This increased speed will allow ARM to compete in many new markets. However, the Cortex-A8 will only become truly competitive if it gains solid tool and software support.

The Cortex-A8 is available now. ARM plans to offer a new version of its software development tools with support for NEON in early 2006, and plans to offer a vectorizing compiler by early 2007. RTL and SystemC models of the core are expected to be available in early 2006, and development boards are expected to be available in late 2006.

Add new comment

Log in to post comments