Next-Generation Power Architecture-Based SoCs Embrace Advanced Lithography, Core Virtualization, SIMD Instruction Set

Submitted by BDTI on Tue, 08/23/2011 - 12:02

Freescale's re-engagement with historical Power Architecture (previously known as PowerPC) CPU business segments, such as communications, industrial, medical, military, robotics, and surveillance systems, began in earnest at the June 2008 Freescale Technology Forum when the company unveiled its first QorIQ (pronounced "Core IQ") product families. Responding to longstanding market requests for multi-core offerings, which to date had been addressed by only a single member of the PowerQUICC III family (the dual-core MPC8572E), multi-core processors were at the core (pun intended) of Freescale's QorIQ marketing pitch in 2008 and beyond, although low-cost single-core offerings were also made available in some cases.

 

CPU core

On-chip DSP resources

PowerQUICC III

e500 (32-bit)

AltiVec (integer and floating point)

QorIQ P1 and P2

e500v2 (32-bit)

N/A

QorIQ P3 and P4

e500mc (32-bit)

N/A

QorIQ P5

e5500 (64-bit)

N/A

QorIQ Qonverge

e500mc (32-bit)

StarCore SC3850 (integer only)

QorIQ AMP

e6500 (64-bit)

AltiVec (integer and floating point)

The P1 and P2 series of ICs came first, with P3 and P4 SoCs and high-end 64-bit P5 products following them into the market. In addition to QorIQ devices' CPU cores and associated multi-level cache schemes, the SoCs also contain a host of application-tailored peripherals, such as:

·       A dedicated queue manager

·       System memory controllers along with DMA controllers

·       USB 2.0 interfaces

·       Security and encryption engines

·       An integrated SERDES network for implementing high-speed Ethernet

·       RapidIO, PCIe and similar system buses

·       etc.

In February, Freescale took integration in a somewhat different direction, unveiling the QorIQ Qonverge product line for wireless infrastructure applications, combining multiple e500mc Power Architecture cores with StarCore SC3850 integer-only DSP cores. And at the most recent Freescale Technology Forum (in June 2011), the company's product focus turned back to conventional QorIQ with the launch of the QorIQ AMP (Advanced Multiprocessing) family. Still in design, the first QorIQ AMP member, the T4240, is scheduled for initial sample availability early next year with production later in 2012.

 QorIQ Block Diagram

Figure 1: QorIQ Block Diagram  

More generally, Freescale plans a multi-tier product categorization approach, reminiscent of the P1-through-P5 naming of the QorIQ predecessors. The company will differentiate specific devices by the number and clock speed of e6500 64-bit Power Architecture cores, the target power consumption, the amount and variety of embedded cache memory and the degree of on-chip peripheral integration. From the press release:

The AMP series consists of three levels of products within a scalable portfolio, initially spanning from ultra-high-performance processors featuring 24 virtual cores down to single-core products.

Control plane processors (service provider routers, storage networks)

·       Up to 6 cores running at up to 2.5 GHz

·       Greater than 6 MB L2 cache

High-end data plane processors (routers, switches, access gateways, mil/aero applications)

·       Up to 24 virtual cores running at up to 2.0 GHz

·       50 Gbps IP forwarding capability

·       Advanced application acceleration

Low-end data plane processors (media gateways, network attached storage, integrated services router)

·       Up to 8 virtual cores running up to 1.6 GHz

·       Advanced application acceleration

·       Less than 10W power

QorIQ AMP products will be the first devices built on the latest-generation 28 nm process. Unlike some other semiconductor suppliers, the company chose to skip the intermediate 32 nm process node. Whereas Freescale's previous 45 nm process employs SOI (silicon-on-insulator) technology to reduce parasitic device capacitance, thereby improving performance, Lisa Su (Freescale's senior vice president and general manager of the networking and multimedia division) revealed during a 2011 Forum briefing that this time, the company felt conventional bulk CMOS fabrication represented an acceptable approach. Presumably, although Su declined to comment in detail, such a strategy would lead to a lower-cost silicon foundation for the new products, along with expanding the number of foundries available to manufacture them.

Versus the 64-bit e5500 Power Architecture core employed in prior-generation P5-series QorIQ products, the QorIQ AMP's 64-bit e6500 contains several notable alterations and enhancements. Instead of the 512 KBytes of per-core private L2 cache in the e5500, each e6500 multi-core cluster shares a larger L2 cache (size as-yet unannounced, and device-dependent), with all clusters subsequently accessing a common on-chip L3 cache. The e6500 core, claimed by Freescale to run at up to 2.5 GHz, is also multi-threaded, a first-time implementation accomplishment for Freescale. Reminiscent of Intel's Hyper-Threading scheme, for example, each e6500 physical core translates into two virtual cores. However, according to Su, the comparatively increased amount of functional logic duplication within each e6500 physical core translates into much higher virtual core performance, not only with carefully crafted benchmark software snippets but also across a range of real-life application code suites, than had historically been achievable by Intel and others' simultaneous multithreading schemes.

Equally notable, the e6500 core marks the resurrection of AltiVec instruction set support, absent from prior QorIQ devices. As Freescale conceptually announced last fall, the company has functionally enhanced the AltiVec vector processor found in historical PowerQUICC products, leading to a 128-bit SIMD (single instruction, multiple data) engine that operates independent of the core's scalar processor and FPU (floating point unit). During the 2011 Forum press briefing, Su declined to explain in detail why AltiVec had returned to the Freescale fold, instead offering a general comment that the company evaluates the applicability of various features when architecting each product generation, and that this particular time it made sense to re-integrate AltiVec capabilities.

Subsequent feedback from Freescale in response to my request for background info on why AltiVec had been removed in the first-generation QorIQ, and why it had then made a resurrection in QorIQ AMP, was only marginally more enlightening:

Design cycles in the military and aerospace markets are somewhat longer than other markets, so for many mil/aero applications, the AltiVec technology in PowerQUICC offerings could suffice. Having said that, customer demand for AltiVec has been strong and we are happy to carry it forward in the AMP Series. In addition, more stringent performance requirements for applications such as video surveillance, networking, telecom security and printing/imaging make the DSP-like capabilities of AltiVec increasingly useful for additional markets.

But Freescale's third explanation attempt to me was substantially more insightful:

As mentioned prior, design cycles in the mil/aero areas are longer than other markets, so for many mil/aero applications, the AltiVec technology in PowerQUICC offerings could suffice to serve A&D [Editor note: aeronautics and defense] for a significant period of time. With this in mind, as we began the system architecture of the QorIQ family five years ago, we considered all SoC design trade-offs related to die size, performance and power consumption. The broad availability of AltiVec across our PowerQUICC family, together with our desire to lead the market in power, performance, cost, and die size in multi-core SoCs for networking gear, resulted in the first generation of QorIQ...which has seen extremely strong market acceptance, helping to drive our recent year-over-year market share gain in embedded processors for communications equipment. In the mean time, more application spaces are now looking for the kind of capabilities and performance that AltiVec offers. Fortunately for Freescale, we have well-established AltiVec technology and have implemented it in the new QorIQ AMP processors. Not only will this new processor family serve these growing networking-and-other applications, it will provide better performance for the A&D market.

The AltiVec re-inclusion will inevitably lead to interesting product selection decisions for Freescale's customers going forward; do they go with QorIQ Qonverge products containing fewer prior-generation CPU cores but including dedicated DSP cores (albeit with integer-only support), or newer and more advanced Power Architecture-based QorIQ AMP devices with DSP algorithm-capable SIMD integer and floating-point capabilities?

QorIQ AMP retains many prior-generation product features, such as the CoreNet interconnection topology, cache coherency, hardware hooks for robust O/S and application virtualization, and deep-visibility on-chip debugging features. Other function-specific logic blocks have been enhanced to meet evolving application demands. Again, from the press release:

Complementing the programmable e6500 cores is a broad range of highly advanced acceleration engines and co-processing technologies, including enhanced security, pattern matching and compress/decompress engines, as well as Freescale's proven data path acceleration (DPAA) technology. The AMP series' compression/decompression technology provides 20 Gbps of performance, and a new SEC 5.0 crypto accelerator offloads protocol processing, including LTE, IPSec, and SSL, at up to 40 Gbps while delivering nearly 140 Gbps of raw crypto hardware acceleration for current and emerging wireless and wireline algorithms. Other new acceleration/offload technologies are incorporated to support regex acceleration, 128-bit SIMD data prefetching, in-line parsing and classification, and quality of service functionalities.

The pattern-matching engine is useful, for example, in identifying and responding to virus signatures present in network packets, or for managing access to various network servers, services, protocols and ports. And here's more specifically on the compress/decompress engine, from the QorIQ Fact Sheet (PDF):

The AMP series of processors introduce the decompress/compress engine (DCE) targeting the data center with its need to transfer large blocks of data across the infrastructure. The DCE supports the raw DEFLATE algorithm (RFC1951), GZIP format (RFC1952) and ZLIB format (RFC1950), as well as Base64 encoding and decoding (RFC4648).

Freescale claims to obtain notable power consumption improvements solely from the migration from the 45 nm to 28 nm process node. Additionally, the company has outfitted QorIQ AMP with six levels' worth of independently controllable power mode granularity per core and per function-specific processing "engine", in addition to fine-grained clock speed variability, all capabilities translating to predicted power consumption reductions of up to 50% versus prior-generation predecessor devices with comparable performance metrics.

The premier QorIQ AMP T4240 embeds 12 dual-threaded e6500 cores, translating to 24 threads' worth of simultaneous processing support. Along with claimed improved per-thread DMIPS efficiency and higher clock speed capabilities, it is forecasted by Freescale to deliver a 4x performance improvement versus the QorIQ P4080, at a 2x power efficiency gain. Actual results, both absolute and relative to prior-generation devices from Freescale and its competitors, will have to wait for BDTI Benchmark analysis after functional silicon is in hand. Freescale plans to unveil additional T4240 and follow-on product specifics later this year; more generally, the company "plans to introduce additional AMP products during each quarter following the initial product launch, filling out the platform across three tiers."

Add new comment

Log in to post comments