NVIDIA GB200 NVL72 - Next-Gen AI Supercomputer
Highlights
Breakthrough Performance for AI Workloads
LLM Inference
vs. NVIDIA H100 Tensor Core GPU
LLM Training
vs. H100
Energy Efficiency
vs. H100
Data Processing
vs. CPU
LLM inference and energy efficiency: TTL = 50 milliseconds (ms) real time, FTL = 5s, 32,768 input/1,024 output, NVIDIA HGX™ H100 scaled over InfiniBand (IB) vs. GB200 NVL72, training 1.8T MOE 4096x HGX H100 scaled over IB vs. 456x GB200 NVL72 scaled over IB. Cluster size: 32,768
Data processing benchmark: A database join and aggregation workload with Snappy / Deflate compression derived from TPC-H Q4 query. Custom query implementations for x86, H100 single GPU and single GPU from GB200 NLV72 vs. Intel Xeon 8480+ Projected performance subject to change.
Real-Time LLM Inference
The GB200 NVL72 delivers 30X faster real-time inference for trillion-parameter language models. Powered by second-generation Transformer Engine with FP4 AI and fifth-generation NVLink, it combines 1.4 exaFLOPS of AI performance with 30TB of high-speed memory in a single unified architecture.
Accelerated LLM Training
Train large language models 4X faster with the second-generation Transformer Engine featuring FP8 precision. Fifth-generation NVLink provides 1.8TB/s GPU-to-GPU interconnect, complemented by InfiniBand networking and NVIDIA Magnum IO software for maximum throughput.
Sustainable AI Infrastructure
Liquid cooling technology enables 25X better performance than H100 air-cooled systems at the same power consumption. This advanced cooling solution increases compute density, reduces datacenter footprint, and minimizes water usage while enabling high-bandwidth, low-latency GPU communication.
Accelerated Data Analytics
Speed up database queries by 18X compared to CPU with high-bandwidth memory, NVLink-C2C, and dedicated decompression engines. The NVIDIA Blackwell architecture delivers 5X better total cost of ownership for enterprise data processing workloads.
Features
Technological Breakthroughs
Blackwell Architecture
The NVIDIA Blackwell architecture delivers groundbreaking advancements in accelerated computing, powering a new era of computing with unparalleled performance, efficiency, and scale.
NVIDIA Grace CPU
The NVIDIA Grace CPU is a breakthrough processor designed for modern data centers running AI, cloud, and HPC applications. It provides outstanding performance and memory bandwidth with 2X the energy efficiency of today's leading server processors.
Fifth-Generation NVIDIA NVLink
Fifth-generation NVLink delivers 1.8TB/s GPU-to-GPU interconnect bandwidth, enabling seamless communication across massive GPU clusters for trillion-parameter AI models and exascale computing workloads.
NVIDIA Networking
NVIDIA Quantum-X800 InfiniBand and Spectrum-X800 Ethernet provide the high-performance networking backbone for distributed AI training and inference, enabling efficient scaling across thousands of Blackwell GPUs.
Specifications
GB200 NVL72 1 Specs
| GB200 NVL72 | GB200 Grace Blackwell Superchip | |
|---|---|---|
| Configuration | 36 Grace CPU : 72 Blackwell GPUs | 1 Grace CPU : 2 Blackwell GPU |
| FP4 Tensor Core2 | 1,440 PFLOPS | 40 PFLOPS |
| FP8/FP6 Tensor Core2 | 720 PFLOPS | 20 PFLOPS |
| INT8 Tensor Core2 | 720 POPS | 20 POPS |
| FP16/BF16 Tensor Core2 | 360 PFLOPS | 10 PFLOPS |
| TF32 Tensor Core2 | 180 PFLOPS | 5 PFLOPS |
| FP32 | 6,480 TFLOPS | 180 TFLOPS |
| FP64 | 3,240 TFLOPS | 90 TFLOPS |
| FP64 Tensor Core | 3,240 TFLOPS | 90 TFLOPS |
| GPU Memory | Bandwidth | Up to 13.5 TB HBM3e | 576 TB/s | Up to 384 GB HBM3e | 16 TB/s |
| NVLink Bandwidth | 130TB/s | 3.6TB/s |
| CPU Core Count | 2,592 Arm® Neoverse V2 cores | 72 Arm Neoverse V2 cores |
| LPDDR5X Memory | Bandwidth | Up to 17 TB LPDDR5X | Up to 18.4 TB/s | Up to 480GB LPDDR5X | Up to 512 GB/s |
1 Preliminary specifications. May be subject to change.
2 With sparsity.
Leading GPU Cloud Provider
