NVIDIA GB200 NVL72 - Next-Gen AI Supercomputer

Highlights

Breakthrough Performance for AI Workloads

LLM Inference

30X

vs. NVIDIA H100 Tensor Core GPU

LLM Training

vs. H100

Energy Efficiency

25X

vs. H100

Data Processing

18X

vs. CPU

LLM inference and energy efficiency: TTL = 50 milliseconds (ms) real time, FTL = 5s, 32,768 input/1,024 output, NVIDIA HGX™ H100 scaled over InfiniBand (IB) vs. GB200 NVL72, training 1.8T MOE 4096x HGX H100 scaled over IB vs. 456x GB200 NVL72 scaled over IB. Cluster size: 32,768

Data processing benchmark: A database join and aggregation workload with Snappy / Deflate compression derived from TPC-H Q4 query. Custom query implementations for x86, H100 single GPU and single GPU from GB200 NLV72 vs. Intel Xeon 8480+ Projected performance subject to change.

Real-Time LLM Inference

The GB200 NVL72 delivers 30X faster real-time inference for trillion-parameter language models. Powered by second-generation Transformer Engine with FP4 AI and fifth-generation NVLink, it combines 1.4 exaFLOPS of AI performance with 30TB of high-speed memory in a single unified architecture.

Accelerated LLM Training

Train large language models 4X faster with the second-generation Transformer Engine featuring FP8 precision. Fifth-generation NVLink provides 1.8TB/s GPU-to-GPU interconnect, complemented by InfiniBand networking and NVIDIA Magnum IO software for maximum throughput.

Sustainable AI Infrastructure

Liquid cooling technology enables 25X better performance than H100 air-cooled systems at the same power consumption. This advanced cooling solution increases compute density, reduces datacenter footprint, and minimizes water usage while enabling high-bandwidth, low-latency GPU communication.

Accelerated Data Analytics

Speed up database queries by 18X compared to CPU with high-bandwidth memory, NVLink-C2C, and dedicated decompression engines. The NVIDIA Blackwell architecture delivers 5X better total cost of ownership for enterprise data processing workloads.

Features

Technological Breakthroughs

Blackwell Architecture

The NVIDIA Blackwell architecture delivers groundbreaking advancements in accelerated computing, powering a new era of computing with unparalleled performance, efficiency, and scale.

NVIDIA Grace CPU

The NVIDIA Grace CPU is a breakthrough processor designed for modern data centers running AI, cloud, and HPC applications. It provides outstanding performance and memory bandwidth with 2X the energy efficiency of today's leading server processors.

Fifth-Generation NVIDIA NVLink

Fifth-generation NVLink delivers 1.8TB/s GPU-to-GPU interconnect bandwidth, enabling seamless communication across massive GPU clusters for trillion-parameter AI models and exascale computing workloads.

NVIDIA Networking

NVIDIA Quantum-X800 InfiniBand and Spectrum-X800 Ethernet provide the high-performance networking backbone for distributed AI training and inference, enabling efficient scaling across thousands of Blackwell GPUs.

Specifications

GB200 NVL72 ¹ Specs

	GB200 NVL72	GB200 Grace Blackwell Superchip
Configuration	36 Grace CPU : 72 Blackwell GPUs	1 Grace CPU : 2 Blackwell GPU
FP4 Tensor Core²	1,440 PFLOPS	40 PFLOPS
FP8/FP6 Tensor Core²	720 PFLOPS	20 PFLOPS
INT8 Tensor Core²	720 POPS	20 POPS
FP16/BF16 Tensor Core²	360 PFLOPS	10 PFLOPS
TF32 Tensor Core²	180 PFLOPS	5 PFLOPS
FP32	6,480 TFLOPS	180 TFLOPS
FP64	3,240 TFLOPS	90 TFLOPS
FP64 Tensor Core	3,240 TFLOPS	90 TFLOPS
GPU Memory \| Bandwidth	Up to 13.5 TB HBM3e \| 576 TB/s	Up to 384 GB HBM3e \| 16 TB/s
NVLink Bandwidth	130TB/s	3.6TB/s
CPU Core Count	2,592 Arm® Neoverse V2 cores	72 Arm Neoverse V2 cores
LPDDR5X Memory \| Bandwidth	Up to 17 TB LPDDR5X \| Up to 18.4 TB/s	Up to 480GB LPDDR5X \| Up to 512 GB/s

¹ Preliminary specifications. May be subject to change.

² With sparsity.

Leading GPU Cloud Provider