- NVIDIA Toolsets Target GPU Acceleration of Deep Learning, Other Algorithms for High-Performance Computing
- Jeff Bier’s Impulse Response—Neural Network Processors: Has Their Time Come?
- Analog Devices' SHARC Doubles Up, Adds ARM Option
- CogniVue's "Opus" APEX Generation 3: Vision Processing With Implementation Flexibility
- Cadence's Tensilica Fusion: DSP for the IoT
NVIDIA and Qualcomm ARM Up Against Competitors
NVIDIA and Qualcomm, two leading ARM licensees and SoC implementers for high volume consumer electronics systems, are now sampling their latest-generation mobile application processors. Both companies recently published documentation describing the unique design techniques and features of their SoCs.
A Tegra Quintuplet
NVIDIA's current in-production product line consists of two generations of Tegra-branded devices. The initial Tegra family is a series of single-core devices based on the ARM11, with varying maximum clock rates, supported DDR SDRAM speed bins, and amounts and types of peripheral integration. It saw limited success save in Microsoft's short-lived Kin cellular handsets and recently end-of-life'd Zune HD portable multimedia player.
Tegra 2 has been somewhat more popular, achieving design wins in a number of leading-edge smartphones and Android-based tablets. In developing it, NVIDIA bypassed the ARM Cortex-A8 generation and went straight from ARM11 to a dual-core Cortex-A9. Strictly speaking, Tegra 2 is a three-ARM-core SoC; the die also includes an ARM7 for overall chip management purposes (Figure 1). You can power-gate and otherwise suspend both Cortex-A9 cores, but you can't shut them down individually.
Figure 1. NVIDIA's Tegra 2 combines two ARM Cortex-A9 cores with an ARM7 for SoC management tasks.
NVIDIA's Cortex-A9 implementation includes the optional FPU (floating-point unit) but leaves out the optional NEON 64/128-bit hybrid SIMD (single-instruction/multiple-data) MPE (media processing engine). Tegra 2's on-chip graphics processor is also internally developed; in this respect, NVIDIA is unlike most ARM processor licensees, who also license a GPU core from a company such as Imagination Technologies (the PowerVR line) or ARM itself (Mali). Then again, that decision isn't terribly surprising when you consider that NVIDIA is first and foremost a graphics accelerator technology and product developer.
At January's Consumer Electronics Show, in sync with Microsoft's unveiling of optional ARM support in upcoming Windows 8, NVIDIA conceptually launched Project Denver, an upcoming SoC (presumably based on Cortex-A15) intended to "power future products ranging from personal computers to servers and supercomputers." The company did not (and subsequently has not) given any indication as to when Project Denver might see the light of day. However, just one month later at the Mobile World Congress show, NVIDIA was far more specific about a chip that had reportedly seen first silicon just 12 days earlier and was already running various graphics and video demos.
That chip is "Kal-El," after the original name of Superman, and the first in a series of superhero-themed chips to come from the company. Like Tegra 2, Kal-El (which will presumably become the Tegra 3) is fabricated on a 40 nm process. And like Tegra 2, it's also built out of Cortex-A9 cores, although in this case NVIDIA said in mid-February that there were four of them. Unlike Tegra 2, however, each core is NEON SIMD-equipped this time around. To keep the die size down, the cores all share the same unified 1 Mbyte pool of L2 cache memory as was found in Tegra 2, as well as a common and Tegra 2-reminiscent single 32-bit system-memory interface. But the integrated GPU is beefier than before, a 12-core unit (versus 8-core in the predecessor).
Still, the two additional Cortex-A9 CPUs and more robust GPU aren't enough to explain the 31 mm2 larger die size for Kal-El versus Tegra 2 (80 mm2 versus 49 mm2). NVIDIA finally fessed up to the source of the discrepancy in late September, via a series of PDF-formatted white papers published to the company website, "The Benefits of Quad Core CPUs in Mobile Devices" and "Variable SMP – A Multi-Core CPU Architecture for Low Power and High Performance". Within Kal-El, you'll learn as you peruse the documents, is yet another ARM Cortex-A9 core; five total.
Figure 2. The upcoming NVIDIA Kal-El (Tegra 3), formerly described as a quad-core SoC, actually contains five Cortex-A9 cores.
According to NVIDIA's documentation, the company's foundry partner offers two versions of its 40 nm process. One, referred to as "LP" (low power), uses low-leakage but slow (~500 MHz max switching speed) transistors. The other, "G" (general purpose), is capable of GHz-plus speeds but is leaky, therefore consuming more power. And even more interesting, according to NVIDIA, LP and G transistors can be combined on a single die. In reality, after subtracting out marketing hype, what NVIDIA may be describing is a single process which the company uses to selectively implement low- or high-threshold transistors for a given circuit block.
For Kal-El, NVIDIA fabricates the IC foundation, including the newly revealed fifth Cortex-A9 core, using low-power transistors, relying on high-performance transistors for the four primary Cortex-A9 cores (which are individually power-gateable this time around). In minimal workload situations, such as when a tablet is in standby mode, Kal-El turns off power to all four high performance Cortex-A9 cores, leveraging only the low-power core. Above a certain threshold, and within 2 msec, an on-chip scheduler switches over to one or multiple high-performance cores, turning off the low-power core in the process. And all of this core juggling occurs without need for operating system intervention or even awareness.
Back in mid-February, NVIDIA officials predicted that Kal-El-based tablets would be in production by August, with smartphones shipping in time for the holiday 2011 shopping season. The latter prognostication may still happen, but the former deadline has already come and gone, although a few tablet prototypes have to date seen the light of day.
A Krait Update
Qualcomm's white paper publication was more recent, at the beginning of this month. Qualcomm is one of a small cadre of ARM architecture (sometimes also called instruction set) licensees. This means that, while the company's ARM designs need to retain full ARM instruction-set backward compatibility, they can also build on that instruction set suite with proprietary instructions, as well as make other more fundamental circuit alterations and enhancements. (Coincident with the Project Denver unveiling in January, NVIDIA also announced that it had upgraded its ARM license from conventional to architectural, a change that came too late for Kal-El but will open up new options for NVIDIA’s future ARM-based products.)
To date, Qualcomm has launched two generations of products based on the Scorpion microarchitecture, developed first on a 65 nm process and then on 45 nm, the latter lithography among other things enabling dual-core capabilities. Scorpion is ARM Version 7 instruction set-compliant; functionally, it's an intermediate step between Cortex-A8 and Cortex-A9, supporting some but not all of Cortex-A9’s out-of-order instruction-execution capabilities. Scorpion-based Snapdragon SoCs also implement Cortex-A9-compliant floating-point and NEON SIMD engines. Scorpion implements the floating-point engine in a pipelined fashion, and the SIMD engine is fully 128-bit capable. And Qualcomm couples the CPU with various iterations of Adreno GPU technology, which the company initially obtained via its January 2009 acquisition of ATI Technologies'-then-AMD's Xilleon handheld-graphics group.
Figure 3. Qualcomm's Krait will still represent a notable architecture and fabrication process advancement when it enters production next year
At February's Mobile World Congress, Qualcomm unveiled the next-generation Krait microarchitecture and the first three products based on it, all fabricated on a 28 nm process (Figure 3). Slated to appear first is the dual-core MSM8960, incorporating the Adreno 225 GPU. Krait retains Scorpion's compatibility with the ARM Version 7 instruction set but makes several notable microarchitecture advancements, such as the ability to fetch and decode three instructions per clock (versus two on Scorpion), full out-of-order execution capabilities, and an 11-stage pipeline (versus 10 stages on Scorpion). Plus, the move to 28 nm technology is forecasted to improve clock speed, power consumption and single-chip integration (including larger cache sizes) capabilities.
More recently, on October 7, Qualcomm published the white paper "Snapdragon S4 Processors: System on Chip Solutions for a New Mobile Age" (PDF), which reiterates and elaborates on the information unveiled at Mobile World Congress, revealing (among other things) that each ARM core is power-gated. The company also demonstrated functional MSM8960 silicon to a small cadre of journalists. At that same time, Qualcomm stated that it "has had MSM8960 silicon back in house for the past three months and is on-track for a release sometime in the first half of next year."
NVIDIA and Qualcomm are certainly vying for customer mindshare with these recent actions, but they've also got other competitors in mind. Texas Instruments, for example, announced in August 2010 that it intended to be the first ARM licensee to produce (with OMAP 5) an ARM Cortex-A15-based SoC. Currently, the company sells a stable of high-speed OMAP 4 Cortex-A9-based offerings, also including latest-generation PowerVR graphics cores, dual-channel external memory controllers and the NEON MPE.
Don't forget about Apple. While not a direct competitor to ARM SoC suppliers, the company's single-core A4 and dual-core A5 Cortex-A9 devices find use in the Apple TV, iPad, iPhone, and iPod touch. Apple's competitors consequently push their SoC suppliers to minimally keep stride with, if not outpace, the output of Apple's internal chip development group. Note, for example, that Apple at the beginning of this month announced the iPhone 4S, which migrates from the iPhone 4's A4 SoC to the A5, thereby both notably boosting graphics performance and enabling impressive Siri-sourced voice recognition capabilities in the form of the Assistant user interface.
And don't count out Intel, either. At early September's Developer Forum held in San Francisco, the company announced next-year plans for several next-generation Atom-based SoCs for handheld devices, and also revealed a tight going-forward linkage with Google and its Android operating system. To date, Intel's multi-year efforts to establish an x86 beachhead in smartphones, tablets and the like have born little fruit. But the company has the deep fiscal and technical resources to continue investing in these areas, as it's seemingly determined to do. And Intel's upcoming 22nm process may negate any power consumption advantage that ARM licensees currently enjoy. With already-demonstrated 28 nm foundry capabilities, though, Qualcomm's Krait won't be far behind.