ARM Cortex-M7: Digital Signal Processing Drives Family Evolution

ARM's Cortex-A series of high-performance CPU cores garner significant attention by virtue of their use in high-volume, high-visibility smartphones, tablets, and other consumer electronics devices. But company's Cortex-M and Cortex-R processor families, which target embedded applications, are even more widely used. The latest Cortex-M family member, the just-announced Cortex-M7, further boosts performance especially in floating-point and other digital signal processing applications, blurring historical distinctions among the Cortex-M, Cortex-R and Cortex-A families (Figure 1).


Figure 1. Historical application boundaries between ARM's three primary CPU core families are becoming increasingly blurred as the "low end" Cortex-M line steadily improves in performance.

As review, Cortex-M cores are intended for microcontroller workloads in deeply embedded applications, where silicon area and power consumption are at a premium. As such, they tend to run at slower clock speeds and are built on trailing edge (alternatively stated: more mature) manufacturing processes as compared to Cortex-A cores. Cortex-M cores also run a mix of 16- and 32-bit code, with the exact instruction set support  (ARMv6 or ARMv7) depending on the CPU variant, and implement 32-bit address support, whereas Cortex-A cores run 32-bit software and ARM is in the process of migrating them to 64-bit addressing capabilities. Cortex-R is a 32-bit address-and-data family, also intended for microcontroller applications, but with added real-time support facilities.

As revealed during a recent briefing with ARM's Ian Johnson (Senior Product Manager, CPU Group) and Thomas Ensergueix (Senior Product Marketing Manager, CPU Group also in attendance, ARM's partners (175 licensees, having taken more than 240 licenses) have shipped more than 8 billion Cortex-M cores since the launch of the Cortex-M3 roughly a decade ago. Last year, approximately 2.9 billion Cortex-M-based embedded processors shipped, followed by 1.7 billion more just in the first half of this year.

The Cortex-M4, unveiled in 2010, built on the Cortex-M3 foundation with a set of instruction set extensions explicitly tailored for digital signal processing, along with an optional single-precision floating-point unit (if included, the core is known as the Cortex-M4F). The new Cortex-M7 further expands the family's floating-point facilities to include a double-precision option; the simultaneous issue of integer and floating point instructions is also now supported if the FPU is present. And ARM also extended Cortex-M7's processing pipeline versus the Cortex-M3 and Cortex-M4 predecessors, from three (single-issue) to six (superscalar in-order) stages (Figure 2).


Figure 2. ARM's latest Cortex-M7 expands on its Cortex-M4 predecessor's capabilities with a twice-as-deep and superscalar instruction execution pipeline, integrated instruction and data caches, and optional double-precision (versus precursor single-precision-only) floating-point support, among other enhancements.

You might think that this pipeline feature set evolution would also lead to faster clock speeds (at the cost of mis-prediction stalls) by virtue of its deepening, and according to Ensergueix and Johnson, you'd be right... but only eventually. That's because the processes that Cortex-M7-based microcontrollers will be built on must also support embedded flash memory, given the now-common customer requirement for in-system updateable firmware capabilities, and are therefore several generations removed from the leading edge lithographies used for Cortex-A-based SoCs.

Fundamental clock speed improvements were actually not a key focus for ARM in developing the Cortex-M7, according to Ensergueix and Johnson. Nonetheless, the company believes that Cortex-M7 will deliver up to twice the performance of Cortex-M4 on digital signal processing-centric code, specifically if the code uses the M7’s double-precision facilities (Figure 3). ARM has also focused on improving the instructions-per-clock (IPC) efficiency of Cortex-M7 versus predecessors. The first-time inclusion of up to 64 KB each of instruction and data cache on a Cortex-M family member is one key way that the company accomplished this objective, along with including support for tightly coupled interfaces to external memory arrays (TCM).


Figure 3. ARM believes that the Cortex-M7 will crunch code up to twice as fast as the Cortex-M4 (as well as competitors' cores), particularly with digital signal processing-centric software based on 32-bit floating point instructions.

Right now, ARM believes that the bulk of Cortex-M7-based microcontrollers will be fabricated on 90 nm processes, where the CPU core will run at approximately 200 MHz. 55 nm embedded flash memory processes are now ramping; Cortex-M7 will run at an estimated 300 MHz max on them. And 40 nm (400 MHz Cortex-M7 clock speed projected) will follow in a few years, with a 28 nm flash memory successor process likely available in volume by the end of the decade.

Even though Cortex-M7's performance potential is to some degree held back by its fabrication foundation, it's still competitive with low-end Cortex-A cores such as the Cortex-A5. How, then, do you choose between them? One answer, as pointed out by ARM's Jem Davies (Fellow and VP of Technology) and Chris Porthouse (Director of Market Development) in a follow-on briefing, involves operating system and associated application development tool support. Cortex-A cores, as alluded to by Figure 1, include memory management units (MMUs) and are therefore supported by "rich" operating systems such as Android and other Linux variants. Go with a Cortex-M core, on the other hand, and your code development may take more work, but it may also end up "tighter" as a result...and you also won't be spending precious silicon area on a MMU that you may not even need.

Other historical differences between the two families are market-driven, rather than ARM-defined. For example, Cortex-M-based chips tend to be coupled with 2D graphics cores, Ensergueix and Johnson suggested, whereas Cortex-A SoCs and their "richer" applications harness higher-end 3D GPUs and software drivers from ARM (Mali) and other companies such as Imagination Technologies. A similar Cortex-A predominance tends to hold true, said Davies and Porthouse, with the company's function-specialized cores such as the V500 video processor and DP500 display processor.

But there's nothing fundamental from a technical standpoint that would preclude a SoC designer from mating a Cortex-M CPU core with a higher-end graphics, video, display or other co-processor, and Cortex-M7's support for the full-featured 64-bit AXI AMBA4 master bus variant will performance-optimize the interface to them. Consider, for example, a dedicated videoconferencing device, content to run on an embedded RTOS but able to harness the resources of the V500 core's video encode and decode acceleration engines. Plenty of other examples like this exist, and you've got the "green light" to proceed with evaluating them.

The Cortex-M7 core is now available for evaluation and licensing, and according to ARM is fully supported by the company's Keil microcontroller development kit (MDK), which integrates the ARM compilation tools with the Keil µVision IDE and debugger. Additional industry support is already available or under active development from development tool and operating system partners such as Express Logic, FreeRTOS, IAR Systems, Atollic, DSP Concepts, Mentor Graphics, Micrium and SEGGER.

What do you think? Write the first comment.