- The CEVA-XM6 Vision Processor Core Boosts Performance for Embedded Deep Learning Applications
- Next-generation Cadence Tensilica Fusion DSP Core Expands Capabilities, Aspirations
- CEVA Second-generation Deep Learning Toolset Supports Additional Frameworks and Topologies
- Case Study: How to Implement Deep Learning for Vision on Embedded Processors
- New Synopsys Processor Core Targets Traditional- and Deep Learning-based Embedded Vision
Qualcomm Reveals Details on Scorpion Core
Back in 2005, Qualcomm announced that it had licensed the ARMv7 instruction set architecture and was working with ARM to create its own high-performance core based on that architecture. The new core was dubbed “Scorpion,” and at the time it was announced, Qualcomm didn’t disclose much about it except that it would run at 1 GHz in a 65 nm process and would be customized to provide a high level of performance and energy efficiency in its target mobile applications. Exactly how this combination would be achieved was not revealed, which is typical of Qualcomm; historically, the company has disclosed few details about the processor cores that live inside its chips.
Then in 2006, Qualcomm announced a new chip platform, “Snapdragon,” in which the Scorpion core would be used alongside several other processors and co-processors. According to Qualcomm, Snapdragon will serve a range of high-performance mobile applications, such as high-end smartphones and mobile internet devices. Still, there was little information about the Scorpion core itself.
In conference presentations this year, however, Qualcomm popped the hood on the Scorpion core and presented a detailed description of the core’s microarchitecture and implementation. The Scorpion core (shown in Figure 1) is similar to ARM’s Cortex-A8, which also implements the ARMv7 architecture. Like the Cortex-A8, Scorpion is a superscalar, dual-issue machine, and supports the powerful, signal-processing-oriented NEON instruction set extensions and VFPv3 floating-point extensions (referred to collectively on Scorpion as the “VeNum” media processing engine). Scorpion will be supported by ARM’s standard software development tools, and Qualcomm expects to offer off-the-shelf multimedia codec software that uses VeNum.
Figure 1. Scorpion core block diagram.
Although Scorpion and Cortex-A8 have many similarities, based on the information released by Qualcomm, the two cores differ in a number of interesting ways. For example, while the Scorpion and Cortex-A8 NEON implementations execute the same SIMD-style instructions, Scorpion’s implementation can process128 bits of data in parallel, compared to 64 bits on Cortex-A8. Half of Scorpion’s SIMD data path can be shut down to conserve power. Scorpion’s pipeline is deeper: It has a 13-stage load/store pipeline and two integer pipelines—one of which is 10 stages and can perform simple arithmetic operations (such as adds and subtracts) while the other is 12 stages and can perform both simple and more complex arithmetic, like MACs. Scorpion also has a 23-stage floating-point/SIMD pipeline, and unlike on Cortex-A8, VFPv3 operations are pipelined. Scorpion uses a number of other microarchitectural tweaks that are intended to either boost speed or reduce power consumption. (Scorpion’s architects previously designed low-power, high-performance processors for IBM.) The core supports multiple clock and voltage domains to enable additional power savings.
In addition to developing a custom microarchitecture, Qualcomm also customized the core’s circuit design and layout in an effort to improve energy efficiency.
Overall, Qualcomm has made a huge investment in creating a custom implementation of the ARMv7 architecture. By way of comparison, Texas Instruments customized just the layout for the Cortex-A8 for its OMAP3 chips, and it has been reported that the process took 45 engineers working for a period of years. If so, Scorpion’s development probably represents an investment on the order of tens of millions of dollars. And what’s the payoff?
At first glance, it doesn’t look like much—as noted earlier, Scorpion is expected to run at 1 GHz in a 65 nm process, which is slightly lower than the 1.1 GHz top speed that ARM currently quotes for the Cortex-A8 in 65 nm. Scorpion is quoted as providing 2100 DMIPS at 1 GHz; Cortex-A8 is quoted at 2000 DMIPS at the same speed. However, a notable difference is that the Cortex-A8 top speed is for a TSMC GP (general-purpose) process, while the Scorpion speed is for the LP (low-power) process. ARM quotes the speed of Cortex-A8 in an LP process as roughly 650 MHz, and although TI does not publicize the exact speed of the hand-crafted, low-power Cortex-A8 core used in its OMAP3 chips, BDTI has estimated that it runs at roughly 450 MHz. (BDTI’s benchmark results for the Cortex-A8 are available at BDTI’s website, www.BDTI.com.) Thus, Qualcomm expects Scorpion to run significantly faster than Cortex-A8 when both are implemented in the low-power processes commonly used for mobile applications.
What about power consumption? Qualcomm claims that Scorpion will have power consumption of roughly 200 mW at 600 MHz (this figure includes leakage current, though its contribution is typically minimal in low-power processes). In comparison, ARM reports on its website that a Cortex-A8 in a 65 nm LP process consumes .59 mW/MHz (excluding leakage), which translates into about 350 mW at 600 MHz.
BDTI has not independently verified the above clock speeds or power figures, but if they are accurate, it appears that Qualcomm’s efforts have yielded significant benefits in terms of both speed and energy efficiency. Clearly, Qualcomm is betting that its investment will pay off in chip sales, and that these improvements will give Snapdragon an edge over key competitors like TI’s OMAP3430 and Freescale’s i.MX31.