As has been the case at the NVIDIA sponsored GPU Technology Conference, NVIDIA’s CEO, Jensen Huang, gave a two-hour keynote going over significant announcements and introducing partnerships that the company had made for graphics processing, AI and IoT applications.
The company introduced a new GPU platform, the NVIDIA DGX-2. The box, shown in the middle below, has 16 NVIDIA Telsa V100 GPUs with 81,920 CUDA cores and 10,240 Tensor Cores providing 2 petaFLOPs performance. It has 512 GB of GPU memory, 1.5 TB of system memory. The OS sits on two 960 GB NVMe SSDs and internal storage is 30 TB (8 X 3.84 TB) NVMe SSDs.
The image below shows the NVIDIA CUDA-X AI Ecosystem, showing partners for various AI applications and services. CUDA-X are the NVIDIA GPU-Acceleration Libraries for Data Science and AI. Note the mention of cloud storage vendors in the lower right of the figure below.
With AI inference at the network edge growing in popularity and several vendors were showing edge products with GPUs and memory for edge AI applications. An HPE Edgeline product family from the GTC exhibits is shown below. These products provide processing networking and storage at co-location facilities and other network edge locations.
The NVIDIA keynote pointed out the value of fast storage and memory technology for the high performance applications that GPUs enable.
According to NetApp, NVIDIA and NetApp are working closely to bridge the gap between the CPU and GPU universes and to better address a wide range of machine learning, deep learning, training, and inference needs. The company’s ONTAP AI system are said to together the advantages of NVIDIA GPUs and NetApp’s data pipeline expertise.
At the NetApp booth this year, NetApp featuring an ONTAP AI solution combining three of the latest NVIDIA® DGX-2™systems with NetApp® AFF A800 cloud-connected flash storage. The DGX-2 offers 10x the power of NVIDIA’s first-generation DGX system.
NetApp featured the Cisco FlexPod® Datacenter for AI and other workload solution optimizes converged infrastructure. It included Cisco UCS blade and rack servers, Cisco Nexus® 9000 Series switches, Cisco UCS 6000 Series Fabric Interconnects, and NetApp® AFF A800 flash storage arrays.
NetApp also demonstrated its DGX-2 ONTAP AI system in a modular, DDC liquid-to-air cooled cabinet from ScaleMatrix. This cabinet combines the efficiency of water with the flexibility of air, cooling up to 52kW of power load in a single 45U cabinet. These cabinets can be deployed in nearly any environment, and provide clean-room quality environmental control, guaranteed air flow, and integrated security and fire suppression.
Pure Storage was showing its AIRI, AI ready infrastructure, using NVIDIA DGX servers, Arista networking and Pure Storage Flashblades that was originally introduced in 2018. These products support 2-4 PFlops performance with NAND flash capacity from 119 to 374 TB. The 2019 products are two multi-chassis systems with multiple AIRIs daisy chained to create a single larger logical unit. One version of these multi-chassis systems uses up to 9 NVIDIA DGX-a systems for 9 PFlops performance and the other uses 3 NVIDIA GDX-2 systems for 6 PFlops performance. These new systems can be sscaled to 64 rackes with a leaf-spine network.
Pure also announced an AI-focused version of Flashstack, which uses Cisco servers and Nexus data center switches. The AI version using the Cisco UCS C480 M5 ML AI servers and contains up to 8 NVIDIA Tesla V100 GPUs. These are connected using NVLink to make the 3 processors work like a single massive GPU. The figure below is from Pure’s product introduction.
A week before the GTC NVIDIA announced that it was going to acquire Mellanox by the end of 2019. Mellanox makes storage and networking products, including their BlueField flash array controller shown below in a PCIe card. BlueField supports NVMe over Fabrics and storage accelerator offload. The figure below is from the Mellanox exhibit.
Mellanox was also showing its ConnectX-6 200 Gb/s HDR Infiniband and 200G Ethernet PCIe Gen 4X16/Gen3 X32 adapter as well as its Quantum 40-port HDR 200 Gb/s Infiniband Smart Switch. Dolphin Interconnect Solutions was also showing PCIe Fabric switch solutions at the GTC.
DDN was showing high-performance storage solutions for AI and deep learning, benchmarks, customer use cases and an interactive automated retail demo. The company was showing their next generation of A³I reference architectures, which include NVIDIA’s DGX POD™, DGX-2™, and DDN’s AI400™ parallel storage appliance. A featured use case was improved testing results from the Max Delbrück Center (MDC) for Molecular Medicine in conjunction with Zuse Institute Berlin (ZIB. Using an A³I architecture with an NVIDIA DGX-1 system and an AI200™ from DDN, MDC was able to improve accuracy and precision within analyzed images while simultaneously accelerating training performance by 240 percent. More recent testing with an A³I architecture comprised of NVIDIA’s DGX-2 system and DDN’s AI400 revealed results that more than doubled previous improvements. The image below shows the A³I with NVIDIA DGX-1s.
WekiaIO was at the 2019 GTC. The company’s Matrix product was built to handle the demands of new emerging workloads including machine learning, image processing, technical computing and HPC. The figure below shows some comparison performance data comparing Matrix with local SSD and an all-flash NAS.
GTC showcases the latest advances in AI technology and digital storage and memory play an important role in machine learning and other big data application. Flash memory and other solid state storage technologies, expecially using NVMe, are key elements for future AI applications.