NVIDIA’s announcement that it would spend $6.9B to buy the data center networking company Mellanox surprised many people, including long-time NVIDIA watchers. It is by far the largest purchase NVIDIA has ever made. The previous companies it bought were much smaller and often bought at fire sale prices. On a relative basis, the 2001 purchase of rival 3dfx assets was that closest analogy, as NVIDIA was a much smaller company back then, as I explained in an earlier article. Buying the 3dfx assets (and hiring 100 of its staff) was an easier move to justify as the new assets could be put to work on NVIDIA’s core business immediately – PC graphics. Mellanox is in a completely different business – data center networking. Mellanox’s products complement NVIDIA’s products without overlap. And with this purchase, NVIDIA signals that it’s serious about being more than just a GPU company. With its accelerator business growing exponentially and by moving into networking, NVIDIA is now a data center company.
There are so many interesting aspects of the Mellanox acquisition to explore: NVIDIA’s deeper entry into Israel’s tech industry; other computing related assets that are at Mellanox (EZChip and Tilera); how will Jensen Huang’s management style play in Israel; Mellanox’s support for CCIX compute accelerator connection protocol vs. NVIDIA’s own NVLink. I’ll save them for a later article. For now, let’s explore this new NVIDIA.
How did NVIDIA get to be a data center company?
It all starts with the discovery (at Stanford), around 2006, that using Graphics Processing Units (GPUs) for some computationally intensive work loads provides a major improvement in performance per watt over a traditional processor or CPU. It turned out that all the little compute elements used to process pixels (texture processing), could be used for crude scientific calculations. The field was initially called GPU Compute. At the same time, graphics were also getting more complex and full-featured math processing capabilities were being added to GPUs. A few people at NVIDIA, including Prof. Bill Dally and the late John Nicholls, took note of the chance to expand the use for GPUs and make a significant difference in the High-Performance Compute (HPC) market. As a result, NVIDIA added more functionality into its GPUs for HPC workloads and developed the Tesla product line, modeled after its Quadro professional workstation product line. The company also developed the CUDA programming framework for its GPUs, but never supported for any other GPUs. AMD, the main competing GPU supplier, chose to wait for OpenCL to develop, which, as is frequent with industry standards, was much slower to develop. NVIDIA was remarkably successfully in HPC and is in 127 of the TOP500 list of Supercomputers and also powers the world’s two fastest supercomputers — the U.S. Department of Energy’s Summit at Oak Ridge National Laboratory and Sierra at Lawrence Livermore National Lab.
Because of NVIDIA’s work on GPU computing for HPC, some researchers in AI decided to use the GPU to accelerate new machine learning algorithms called deep convolutional neural networks (DCNN). The combination of new DCNN and GPUs made the training and operation (or inference) of AI neural networks much faster and more accurate than anything before. That led to a Cambrian explosion of AI research and applications, with NVIDIA leading the wave. The company quickly adapted its GPUs for these new work loads, adding new math functions and even dedicated processing elements called Tensor Cores. NVIDIA also developed a series of software libraries under the name cuDNN optimized for CUDA and deep neural networks.
As a result of the explosion in AI research, each cloud vendor has developed their own language – Google has TensorFlow, Facebook has Pytorch/Caffe 2, etc. Even with this fragmentation of AI frameworks, the field is still growing at a rapid pace. Still research into new algorithms is continuing and therefore a flexible approach has long-term cost of ownership benefits. Because of the flexibility of accelerators like the GPU (or alternatively an FPGA) offer, it is easy to adapt to new algorithms. In his GTC 2019 keynote, Jensen called this architecture “PRADA” – PRogrammable Acceleration of multiple Domains from one Architecture. That architecture compatibility allows the building of an installed base of software and system and drives down the cost of the infrastructure.
Moving from chips to systems
In Jensen’s keynote he made the case that data science is now the 4thpillar of scientific method. NVIDIA realizes that there’s a shortage of data scientists and AI researchers, therefore, the productivity of these people is important and to keep the momentum going, it is important to bring the resources to a wider range of developers. Therefore, the company designed DGX workstations and servers fully loaded with CUDA-X tools and libraries for ML research. The company is expanding its reach to data scientists with new data science platforms from several system OEMs, including Dell, HP Inc, and Lenovo.
Even with the new systems and tools, the industry still faces the challenge of sorting through exabytes of new and existing data for business and scientific insights. This is driving data science to solve the problem of too much data. When we get autonomous vehicles, they will generate exabytes of information that will need to be processed. This is why the company believes more and more data centers will need to build in AI processing to sort through all this data.
Supercomputers vs. Hyperscalers
In its work in HPC, NVIDIA has focused on maximum compute performance on very large problems (scale up). Hyperscale data centers often have many, many compute tasks running concurrently (scale out). The needs of data science fall right in between – large data sets and many users and with characteristics of both scale up and scale out.
To meet these varying needs, NVIDIA has built a number of server projects with Mellanox providing the rack networking. As a result of Mellanox’s success, it became an acquisition target by a variety of silicon and cloud companies, including companies like Intel and Microsoft. However, rather than going to one of these companies, Mellanox sought a more friendly partner like NVIDIA. When approached with the opportunity to be the white knight for Mellanox, Jensen jumped at the chance. With the growing containerization and hyperscaling of workloads with data analysis programs like Hadoop, SPARC, and RAPIDS, he sees an exponential increase in rack-to-rack communications, often referred to as east-to-west communications in a data center. As a result, low-latency networking has become critical to create a compute fabric. The Mellanox networking technology can make a data center flexible enough for these evolving workloads. The key development by Mellanox was offloaded networking tasks from the CPU to accelerators and in the future it will add AI to its switching products to move data more efficiently.
For server scale up applications (like HPC), the goal is to make multiple GPUs to work like one giant GPU. This is where NVIDIA’s NVLink comes into play, tying multiple GPUs together into a cluster. For the broader scale out infrastructure, The Tesla T4 card can be deployed. These 70W half-height PCIe cards fit into 2U rack chassis so that these cards can be liberally added to existing data centers. The T4 is NVIDIA’s most flexible data center offering – it can be used for inference, for training (just not at the same speed as the V100), data science, video transcoding, and for VDI (virtual desktop) applications. The T4 is the under appreciated, but in the future, there’s going to be greater emphasis on inference in cloud and edge applications. This is also the area where NVIDIA will find the most competition from Intel, and a growing army of startups.
While there are many contenders to the throne of AI accelerator, NVIDIA products are still king of the hill with the largest software installed based. With the acquisition of Mellanox, it just expanded its data center domain.