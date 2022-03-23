New GPU has big performance boost along with increased power consumption

Today, at the GPU Technology Conference (GTC) event, Nvidia revealed details of its Hopper architecture and the Nvidia H100 GPU. The Hopper architecture and H100 GPU should not be confused with Ada, the consumer-focused architecture that will power future GeForce cards.

Nvidia has yet to reveal details about the Ada architecture, and the Hopper H100 will replace the Ampere A100, which replaced the Volta V100. These are all datacenter pieces, and with fiercer competition from manufacturers like AMD’s Instinct MI250/250X cards and the recently announced Instinct MI210, Nvidia is looking to retake the lead in high-performance computing.

As you would expect, given its legacy, the H100 was designed for supercomputers with a focus on artificial intelligence capabilities. It includes numerous updates and upgrades over the current A100, all designed to reach new levels of performance and efficiency.

Hopper contains 80 billion transistors and is built using a custom TSMC 4N process – slightly different from the 4nm N4 process that TSMC also offers. For comparison, the A100 GPU had ‘only’ 54 billion transistors.



Nvidia has not revealed the new card’s core and clock counts, but has given some other details. The H100 supports Nvidia’s fourth-generation NVLink interface, which can provide up to 128GB/s of bandwidth. It also supports PCIe 5.0 for systems that don’t use NVLink, which also tops out at 128 GB/s. The updated NVLink connection provides 1.5 times more bandwidth than the A100, while PCIe 5.0 offers twice the bandwidth of PCIe 4.0.



Big leap in performance…

The H100 will also support 80GB of HBM memory, with 3TB/s of bandwidth — that’s 1.5 times faster than the A100’s HBM2E. While the A100 was available in 40GB and 80GB models, with the latter coming later to market, both the H100 and A100 still use up to six stacks of HBM modules. Overall, the H100 has 50% more memory and bandwidth compared to its predecessor.

That’s a nice improvement, but other aspects of the GPU Hopper involve even greater boosts. The H100 can provide up to 2,000 TFLOPS of FP16 compute and 1,000 TFLOPS of TF32 compute, as well as 60 TFLOPS of general-purpose FP64 compute, which is triple the performance of the A100 in all three cases.

The Hopper GPU also adds improved FP8 support with up to 4,000 TFLOPS of compute, six times faster than the A100 (which had to rely on FP16 as it didn’t have native FP8 support). To help optimize performance, Nvidia also has a new transformer engine that automatically switches between FP8 and FP16 formats based on workload.



Nvidia has also added new DPX instructions designed to speed up dynamic programming. This can help with a wide variety of algorithms, including route optimization and genomics, and Nvidia claims that the performance of these algorithms is up to 7 times faster on the Hopper H100 than its previous generation GPUs, and up to 40 times faster than the previous generation GPUs. CPU-based algorithms. Hopper also includes changes to improve security, and in GPU virtualization systems now allows for seven secure instances running on a single H100 GPU.

… and consumption

All of these changes are important to Nvidia’s supercomputing and AI goals. Despite the move to a smaller manufacturing node, the TDP of the H100 for the SXM variant has been increased to 700W, compared to 400W for the A100 SXM modules. That’s 75% more power, for improvements that seem to range between 50% and 500% depending on the workload. Overall, we expect performance to be two to three times faster than the Nvidia A100, so there should still be a net improvement in efficiency, but it’s more evidence of Moore’s Law slowing.

Overall, Nvidia claims that the H100 is better than the A100 and can provide up to 9 times more throughput in AI training. It also offers 16 to 30 times more inference performance using the Megatron 530B throughput as a benchmark. Finally, in HPC applications like 3D FFT (Fast Fourier Transform) and genome sequencing, Nvidia says the H100 is up to 7 times faster than the A100.

The Nvidia H100 GPU is only part of the story, of course. As with the A100, the Hopper will initially be available as a new DGX H100 rackmount server. Each DGX H100 system contains eight H100 GPUs, providing up to 32 PFLOPS of AI compute and 0.5 PFLOPS of FP64, with 640GB of HBM3 memory. The DGX H100 also has 3.6 TB/s of bandwidth.

Using multiple DGX H100 servers, Nvidia expands to a DGX SuperPod with 32 DGX H100 systems, linked to an upgraded NVLink Switch system and Quantum-2 InfiniBand network. A single SuperPod H100 features 256 H100 GPUs, 20TB of HBM3 memory and up to 1 ExaFLOPS of AI compute, while also delivering 70.4TB/s of bandwidth.

Of course, supercomputers can be built using multiple SuperPods, and Nvidia has announced its new Eos supercomputer that follows in Selene’s footsteps. The Eos will be built from 18 SuperPods H100, with 576 DGX H100 systems and 360 NVLink switches, and will provide 275 PFLOPS of FP64 compute. More importantly in Nvidia’s AI-focused future, it will provide 18 EFLOPS of FP8 compute for AI or 9 EFLOPS of FP16.

