Massively parallel processor supplier Tilera is a company that InsideDSP has kept an eye on for nearly a decade now, stretching back to BDTI's benchmarking of the company's first-generation TILE architecture in its 64-core form (see sidebar "Company, Architecture and Product Line Background"). After an initial flurry of product releases, privately held Tilera grew uncharacteristically quiet over the next half-decade, focusing on rolling out the remainder of the TILE-Gx family, conserving its fiscal resources and closing design wins in applications such as intelligent networking, cryptography, security, server processing acceleration, and videoconferencing, video transcoding and video surveillance.
Mid-last year brought news that the company was in the process of being acquired by larger (and publicly held) EZchip Semiconductor; the transaction closed later that same year. In a recent conversation, Bob Doud, the company's director of marketing (and former Tilera director of processor strategy) echoed statements made in the press release announcing the acquisition, noting that the two companies' product lines and customers were non-overlapping and that the combination would therefore expand EZchip's potential market from "a few hundred million dollars a year to billions of dollars a year":
The acquisition represents a significant move for EZchip. In particular:
- Leapfrogs EZchip into a fast-growing market: EZchip will become a meaningful player in the multi-core CPU market building upon the existing Tilera products, customers, revenues and over 100 patents.
- Doubles EZchip’s total available market (TAM): To $2 billion, consisting of NPUs, multi-core CPUs, smart network adapters and appliances, for the data-center and telecom markets.
- Adds data center and cloud networks to EZchip’s target markets: Market segments already identified by EZchip as strategic for future growth.
- Broadens EZchip’s customer base: Adds over 100 Tilera customers in the various market segments.
- Diversifies EZchip’s product lines: With the addition of 9-, 16-, 36- and 72-core CPUs and a family of intelligent network interface cards and white-box appliances.
- Clear product roadmap synergies: Leverages technology and expertise from EZchip’s leading high-speed NPUs and Tilera’s leading multi-core CPUs to build new powerful processors.
- Strengthens EZchip’s US presence: In sales, marketing, R&D and operations, in both Silicon Valley and Boston.
The primary purpose of InsideDSP's discussion with Doud, however, was not to talk about the acquisition itself but rather the first product fruits from it, the upcoming TILE-Mx family. Previously, Tilera's devices had all been based on a proprietary processor core derived from the founders' prior work at MIT and DARPA. And in fact, according to Doud, the company had a next-generation 100+ core product line well along in development, based on a proprietary dual core-per-tile approach. But in parallel, Tilera was toiling away on a "skunkworks" project that was ARM-based.
A migration to an ARM-based "manycore" design promised many long-term benefits to the company and its customers, such as relief from the burden of ongoing CPU core development, and the ability to leverage ARM's partner ecosystem versus needing to continue doing its own tool suite development. But the effort required both to create an ARM-based silicon architecture and toolset and to migrate existing customers to it would be significant, as would be the cost of an ARM license. The cash and headcount infusion generated by the EZchip acquisition neatly relieved both burdens, enabling Tilera's ARM aspirations to move forward.
Specifically, each TILE-Mx processing tile comprises four ARM Cortex-A53 ARMv8 64-bit processor cores, each offering 128-bit NEON SIMD and floating point acceleration capabilities (Figure 1). As its name implies, the first product in the family, the TILE-Mx100, integrates 100 total ARM cores, tied together by a SkyMesh high performance (25 Tbps aggregate bandwidth) and cache-coherent 2D interconnect scheme. SkyMesh is a minor evolution of the company's prior iMesh approach, differentiated primarily by the addition of ARM core-specific bus "hooks". Each SkyMesh bus (east, west, north and south) consists of several hundred wires comprising three protocol levels; the total amount of interconnect varies depending on the size of the device (i.e. the number of tiles).
Figure 1. EZchip transitions from a past Tilera-proprietary processor core approach to a quad ARM Cortex-A53 atomic "tile" (top) in assembling the upcoming initial 100-core TILE-Mx product family member (bottom)
The TILE-Mx100 integrates 40 MBytes of total on-chip cache, subdivided into a L1/L2/L3 hierarchy, along with a series of Tile Core Accelerators (TCAs), many obtained from parent company EZchip: a traffic management engine, a programmable network front-end, accelerated lookup functions, cryptography acceleration (both symmetric and public key), and statistics and atomic-flow table acceleration. One key reason why Tilera remained true to its proprietary processor architecture for the first three product family generations, according to Doud, was that the company was unable to find a licensable CPU core alternative that had the necessary combination of license-plus-silicon implementation price, performance and power consumption. The timing of the EZchip acquisition, he said, was fortuitous because it aligned with the availability of the Cortex-A53, which for the first time delivers the necessary efficiency and area density. Features needed for EZchip’s target applications and lacking in the Cortex-A53 are handled by the EZchip-supplied hardware accelerators.
The TILE-Mx product family will be fabricated in TSMC's 28nm HPM (high performance mobile) process variant. Although the company is publicly discussing the TILE-Mx100 now, the chip is still in development, having experienced a mid-stream re-definition delay driven in part by the Tilera acquisition and subsequent decision to add EZChip-developed TCAs to the design. Sampling is forecast to begin some time in the second half of next year, with production tentatively slated for Q1 2017. Doud says it's premature at this time to provide firm performance estimates, although "ARM-published numbers for the Cortex-A53 are in the high 1 to low 2 GHz range, where we also expect to roughly be." And with respect to pricing, Doud says that it's also "way early" to be announcing specifics, although his rough guidance is for something "slightly south of the one-thousand dollar price range, in thousand-unit volumes."
Prior TILE families carved out modest-sized market niches with well-known customers such as Brocade, Check Point and Cisco. The financial and staffing infusion provided by new parent EZchip will go a long way toward fortifying the TILE product line's long-term viability. Still to be determined, however, is how smoothly the company will navigate its proprietary-to-ARM migration, both for development tools and existing customers, no matter that many of the APIs called by those customers' current code will be preserved. And equally unknown at this time is what the competitive landscape will look like when TILE-Mx devices end up in potential customers' hands.
In late 2009, InsideDSP said the following about then-Tilera, which had been founded in 2004 with its first-generation TILE devices released three years later:
Several years ago the company developed a mesh architecture that supports a large array of homogeneous processor cores. The company’s initial products were the TILE64 and TILEPro, based on an array of 32-bit, 3-way VLIW cores. Tilera designed the architecture for multicore from the outset, and optimized the bus structure for supporting many cores—which requires high bandwidth onto the chip and efficient inter-core communications. One advantage to this approach is that is it highly scalable; the size of the array can vary from chip to chip while maintaining software compatibility.
In explaining the TilePro evolution beyond TILE, Wikipedia notes the following improvements: mesh network enhancements to manage cache coherency, cache size, "way" associativity and transfer scheme upgrades, system memory "striping" to better balance loading, and instruction set extensions for multimedia applications and other functions And the third-generation TILE-Gx, according to InsideDSP, was "based on Tilera’s earlier TILE architectures, but is significantly enhanced."
TILE-Gx, as documented by InsideDSP in 2009, "uses a new 64-bit instruction set architecture that includes 75 new instructions relative to the TILE processors (20 of which are SIMD instructions). Tilera has added instruction-set support for bit manipulation and quad multiply-accumulate operations—useful for a variety of DSP-oriented algorithms. The new chips also have a packet processing accelerator, the “mPIPE,” that isn’t included in the TILE families, and accelerators for compression and encryption."