
Blackwell: Powering the Factories of Intelligence
Blackwell is more than a GPU — it’s the backbone of an entire system architecture built to fuel the world’s AI factories. These massive computing engines produce intelligence at unprecedented scale, training and serving the largest AI models ever created.
Today’s frontier AI models already stretch into hundreds of billions of parameters, serving nearly a billion users weekly. The next generation will push well beyond a trillion parameters, trained on datasets with tens of trillions of tokens spanning text, images, and video.
Meeting this demand requires scaling data centers to thousands of systems working in parallel. But even more powerful gains come from scaling up — building a bigger, more efficient computer. That’s where Blackwell redefines the limits.
The Most Demanding Computing Challenge
AI factories are the machines of the next industrial revolution. Their product is intelligence, and their core task is AI inference — the most demanding computing workload today.
To succeed, these factories need infrastructure that can flex, expand, and maximize every ounce of compute power. That means harmonizing compute, networking, storage, power, and cooling — integrated from silicon to system racks — and unified by software that treats tens of thousands of Blackwell GPUs as one.
At the heart of this transformation is the NVIDIA GB200 NVL72, a rack-scale system designed to function as a single massive GPU.
The Birth of a Superchip
Central to the system is the NVIDIA Grace Blackwell superchip, which fuses two Blackwell GPUs with one NVIDIA Grace CPU into a unified compute module.
This design delivers an order-of-magnitude leap in performance, made possible by NVIDIA NVLink chip-to-chip interconnect — first introduced with Hopper. NVLink enables direct memory sharing between CPU and GPUs, reducing latency and boosting throughput for massive AI workloads.
A New Interconnect for the Superchip Era
Scaling across multiple superchips without bottlenecks required a new approach. NVIDIA’s answer: the NVLink Switch spine.
This backbone connects 72 GPUs across 18 compute trays with over 5,000 high-performance copper cables, delivering a staggering 130 TB/s of bandwidth. That’s fast enough to move the internet’s peak traffic in less than a second.
One Giant GPU for Inference
All of this comes together in the GB200 NVL72 system — a feat of engineering weighing 1.5 tons, packed with 600,000 components, two miles of wiring, and millions of lines of code.
The result: a system that operates as a single giant virtual GPU, purpose-built for factory-scale AI inference where every nanosecond and watt counts.




