Ivy Bridge: Intel's CPUs Gain a Generational Lithography Edge

Submitted by BDTI on Wed, 05/16/2012 - 19:31

Note: the earlier-published (and reversed) 'tick' and 'tock' descriptions have been corrected.

Jerry Sanders, AMD's brash former CEO, once opined, "Real men have fabs. These fabless guys are nobodies, just boys." In recent times, however, Sanders' comments seemed increasingly antiquated, with various foundries (most notably mighty TSMC) serving the fabrication needs of an increasing number and variety of semiconductor device suppliers. A notable number of those suppliers had historically handled their own manufacturing but eventually decided to mothball their fabs and rely on foundries instead. To wit, AMD announced its intent in late 2008 to spin off the chip-manufacturing portion of its business into a separate entity, originally (and unoriginally) called "The Foundry Company" but later renamed Globalfoundries.

The conventional thinking went something like this: semiconductor process and equipment development is getting exponentially more expensive, making it infeasible for a growing percentage of chip companies (including start-ups) to "go it alone." Instead, by contracting out manufacturing to a separate entity whose business is that and nothing but that, chip companies could benefit from the improved economies of scale, while the foundry ensured that it had a sufficient number of customers to fully utilize its capacity. The end result, it was believed, would be more rapid and cost-effective process development for a foundry versus what even the largest stand-alone chip supplier could achieve.

Intel apparently didn't get that particular memo. Granted, the company has been the world's largest semiconductor supplier for the last 20 years straight, and is increasingly looking like a foundry in its own right. But although the company has recently forged a few manufacturing partnership agreements, the bulk of its fab capacity is devoted to Intel-branded products. And, exemplified by the "Ivy Bridge" CPUs that have just entered volume production, Intel is now one full process node ahead of even leading-edge foundry-based competitors. Ivy Bridge processors are built on the company's 22 nm Tri-Gate (Intel's brand name for a FinFET structure) process, while AMD (Globalfoundries) and NVIDIA and Qualcomm (TSMC) are challenged to ramp 28 nm and 32 nm-based SoCs into production.

Ivy Bridge is the latest example of Intel's "tick tock" manufacturing model, which the company adopted in the mid 2000’s. A "tock" product exhibits substantial design advancements, such as a new microarchitecture, but is fabricated on the same process as its predecessor. It's followed by a "tick" successor, which makes comparatively small design enhancements but migrates to the next process node. By toggling between feature set and process evolutions, the company minimizes the execution risk for any particular product generation.

For that reason, it makes sense to begin a discussion of the Ivy Bridge "tick" with a review of the 32 nm "Sandy Bridge" "tock" precursor that formed its foundation (Figure 1).

Figure 1. Sandy Bridge, formally launched in product form in early 2011, was Intel's first new microarchitecture in three years and forms the foundation of today's Ivy Bridge descendent.

Sandy Bridge was first demonstrated at the September 2009 Intel Developer Forum, entered volume production in January 2011, and was the primary competitor of AMD's "Bulldozer" processors described in InsideDSP last October. The associated Wikipedia entry for Sandy Bridge provides the following concise list of new and enhanced features versus "Nehalem"-microarchitecture predecessors (which date from 2008):

  • 32 kB data + 32 kB instruction L1 cache (3 clocks) and 256 kB L2 cache (8 clocks) per core
  • Shared L3 cache includes the processor graphics (LGA 1155)
  • 64-byte cache line size
  • Two load/store operations per CPU cycle for each memory channel
  • Decoded micro-operation cache and enlarged, optimized branch predictor
  • Improved performance for transcendental mathematics, AES encryption (AES instruction set), and SHA-1 hashing
  • 256-bit/cycle ring bus interconnect between cores, graphics, cache and System Agent Domain
  • Advanced Vector Extensions (AVX) 256-bit instruction set with wider vectors, new extensible syntax and rich functionality
  • Intel Quick Sync Video, hardware support for video encoding and decoding
  • Up to 8 physical cores or 16 logical cores through Hyper-threading

Perhaps the most significant of these architectural enhancements was the high-speed, bidirectional ring interconnect bus that links the CPU cores to each other, to the shared L3 cache, and to the graphics core. However, from a digital signal processing perspective, several of the other above added features also merit further attention. As mentioned briefly earlier, the graphics processor was integrated on the die in Sandy Bridge; in Nehalam it had been combined with the CPU in the same package, but as a multi-die module (where it was fabricated on a one-generation older 45 nm process).

The Sandy Bridge GPU did not natively support OpenCL (the CPU emulated the OpenCL implementation in software), although some GPGPU (general-purpose computing on graphics processing unit) support was still possible via traditional graphics APIs such as DirectX. And the Sandy Bridge GPU handled the enhanced transcendental math support via dedicated shader hardware; Intel claimed that sin and cosine operations were several orders of magnitude faster than in prior-generation products, for example.

For digital signal processing algorithms that leverage general-purpose processor instructions, the Sandy Bridge Hyper-Threading symmetric multithreading support presented the programmer with twice as many cores as actually existed in physical form on the die; improved SMT efficiency meant that the additional '”virtual'” cores could be used in more situations than before. Those CPU instructions included the new 256-bit AVX (Advance Vector Extensions) SIMD operations, which follow in the footsteps of 80-bit MMX (MultiMedia Extensions) and several generations' worth of 128-bit SSE (Streaming SIMD Extensions). General-purpose integer execution improvements also occurred in Sandy Bridge; Add With Carry (ADC) throughput was doubled, for example, while 64x64 multiplies saw a ~25% speedup, according to Intel.

Last but not least was the Quick Sync Video function block, which at first glance might seem to be a curious inclusion in spite of the fact that it took up very little incremental die area (Figure 2).

Figure 2. The fixed-function blocks that implement notable portions of Intel's Quick Sync Video feature are so small that the company doesn't bother pointing them out on the die plot, but they notably improve the performance and power consumption of video decoding and encoding operations.

After all, the multi-core CPU ran at clock speeds up to 3.6 GHz, and was supplemented by the numerous shader processors and other resources available in the GPU; between them there was seemingly plenty of compute horsepower available for decoding, encoding and transcoding video. And there was...but it came at the expense of battery life. Quick Sync Video took up very little die area, and handled decoding of MPEG-2, MPEG-4 (such as H.264) and VC-1 (also known as WMV9) video, along with encoding (and transcoding) to MPEG-4. Versus the GPU-based approach used in Nehalem, Intel claimed that Quick Sync Video consumed half the power while decoding HD video. And the company also showcased a transcoding demo that converted a 3-minute-long 1080p 30 Mbps HD video to an iPhone-compatible 640x360 pixel H.264 clip in 14 seconds, for a 400 fps average conversion frame rate.

This all brings us to Sandy Bridge's 22 nm successor, Ivy Bridge. To date, Intel has announced only a portion of the overall processor suite; quad-core desktop (Table 1) and mobile CPUs (Table 2). More economical dual-core and ultra-low-voltage processors are to come, targeting mainstream desktop systems and "ultrabooks", respectively. And high-end processors for workstations and servers are also yet to be formally unveiled.

Processor

Core Clock

# of Cores/Threads

L3 Cache Size

Max Turbo Clock

TDP

Price

Core i7 3960X

3.3 GHz

6/12

15MB

3.9 GHz

130W

$999

Core i7 3930K

3.2 GHz

6/12

12MB

3.8 GHz

130W

$583

Core i7 3820

3.6 GHz

4/8

10MB

3.9 GHz

130W

$294

Core i7 3770K

3.5 GHz

4/8

8MB

3.9 GHz

77W

$313

Core i7 3770

3.4 GHz

4/8

8MB

3.9 GHz

77W

$278

Core i5 3570K

3.4 GHz

4/4

6MB

3.8 GHz

77W

$212

Core i5 3550

3.3 GHz

4/4

6MB

3.7 GHz

77W

$194

Core i5 3450

3.1 GHz

4/4

6MB

3.5 GHz

77W

$174

Core i7 2700K

3.5 GHz

4/8

8MB

3.9 GHz

95W

$332

Core i5 2550K

3.4 GHz

4/4

6MB

3.8 GHz

95W

$225

Core i5 2500

3.3 GHz

4/4

6MB

3.7 GHz

95W

$205

Core i5 2400

3.1 GHz

4/4

6MB

3.4 Ghz

95W

$195

Core i5 2320

3.0 Ghz

4/4

6MB

3.3 Ghz

95W

$177

Table 1. Ivy Bridge Desktop CPUs

Processor

Core Clock

# of Cores/Threads

L3 Cache Size

Max Turbo Clock

TDP

Price

Core i7 3920XM

2.9 GHz

4/8

8MB

3.8 GHz

55W

$1096

Core i7 3820QM

2.7 GHz

4/8

8MB

3.7 Ghz

45W

$568

Core i7 3720QM

2.6 GHz

4/8

6MB

3.6 GHz

45W

$378

Core i7 3615QM

2.3 GHz

4/8

6MB

3.3 GHz

45W

Not announced

Core i7 3612QM

2.1. Ghz

4/8

6MB

3.1 GHz

35W

Not announced

Core i7 3610QM

2.3 Ghz

4/8

6MB

3.3 GHz

45W

Not announced

Table 2. Ivy Bridge Mobile CPUs

From a CPU standpoint, the Ivy Bridge changes relative to Sandy Bridge are modest, befitting the family's generational "tick" status. The division block used in both floating-point and integer operations has twice the throughput of the one found in Sandy Bridge, for example. And an integrated digital random number generator, accessible by both user- and O/S-level software, can create standards-compliant 16-, 32- or 64-bit random numbers at an up-to-3 Gbps rate.

The GPU evolution is much more substantial. Whereas the HD 3000 graphics processor in Sandy Bridge only supported version 10 of the DirectX API, the HD 4000 graphics core in Ivy Bridge comprehends latest-generation DirectX v11, along with OpenGL v3.1 and OpenCL v1.1. It grows the maximum EU (execution unit, i.e. shader) count from 12 to 16, along with improving the per-EU feature set and performance. And, since the Quick Sync Video feature leverages not only fixed-function decode and encode function blocks but also the graphics core's EUs, Intel predicts that Ivy Bridge transcoding performance will be up to twice that of Sandy Bridge.

The corresponding chipsets for both the desktop and mobile Ivy Bridge processors are also upgraded from their Sandy Bridge predecessors. Notable additions include integrated support for PCI Express v3.0 and for USB 3.0. And companion second-generation Thunderbolt chipsets will likely bring support for the high-speed external interface bus to Windows systems, thereby expanding today's Apple-only technology embrace.

Intel calls Ivy Bridge a "tick" but, particularly given the notable integrated graphics improvements (which likely have entry-level discrete graphics providers AMD, NVIDIA and VIA concerned about their future fortunes), it could perhaps more accurately be labeled a "tick-plus". Not only will Ivy Bridge keep Intel in a solid competitive position versus AMD in conventional computer designs, its upcoming low-voltage variants will shore up the company's defenses versus upstart ARM licensees in various emerging ultra-thin system form factors.

Add new comment

Log in to post comments