Understanding Nvidia’s NvLink and NvSwitch Evolution: Topology and Rates

2014: Introduction of Pascal Architecture with Tesla P100

In 2014, Nvidia launched the Tesla P100 based on the Pascal architecture. This GPU featured the first-generation NVLink technology, enabling high-speed communication between 4 or 8 GPUs. The NVLink 1.0’s bidirectional interconnect bandwidth was five times that of PCIe 3.0×16. Here’s the calculation:

  • PCIe 3.0×16: Bidirectional communication bandwidth of 32GB/s (1GBx16x2).
  • NVLink 1.0: Bidirectional interconnect bandwidth of 160GB/s (20GBx4x2).

Due to the absence of NvSwitch chips, the GPUs were interconnected in a mesh topology, where 160GB/s represents the total bandwidth from one GPU to four directly connected GPUs.

Pascal Architecture with Tesla P100

2017: Volta Architecture with V100

In 2017, Nvidia released the Volta architecture with the V100 GPU. The V100’s NVLink increased the per-link unidirectional bandwidth from 20GB/s to 25GB/s and the number of links from 4 to 6, raising the total supported GPU NVLink bandwidth to 300GB/s. However, the V100 DGX-1 system released in 2017 did not feature NvSwitch. The topology was similar to NVLink 1.0, with an increase in the number of links.

Volta Architecture with V100

2018: Introduction of V100 DGX-2 System

To further enhance inter-GPU communication bandwidth and overall system performance, Nvidia introduced the V100 DGX-2 system in 2018. This was the first system to incorporate the NvSwitch chip, enabling full interconnectivity among 16 SXM V100 GPUs within a single DGX-2 system.

V100 DGX-2 System

The NVSwitch has 18 NVLink ports, 8 connecting to the GPU and 8 to another NVSwitch chip on a different baseboard. Each baseboard contains six NVSwitches for communication with another baseboard.

Each baseboard contains six NVSwitches for communication with another baseboard.

2020: Ampere Architecture with A100

In 2020, Nvidia launched the Ampere architecture with the A100 GPU. The NVLink and NVSwitch chips were upgraded to versions 3.0 and 2.0, respectively. Although the per-link unidirectional bandwidth remained at 25GB/s, the number of links increased to 12, resulting in a total bidirectional interconnect bandwidth of 600GB/s. The DGX A100 system features 6 NVSwitch 2.0 chips, with each A100 GPU interconnected via 12 NVLink connections to the 6 NVSwitch chips, ensuring two links to each NVSwitch.

The logical topology of the GPU system is as follows:

logical topology of the GPU system

Many people are unclear about the logical relationship between the HGX module and the “server head.” Below is a diagram showing that the SXM GPU baseboard is interconnected with the server motherboard through PCIe links. The PCIe switch (PCIeSw) chip is integrated into the server head motherboard. Both the network card and NVMe U.2 PCIe signals also originate from the PCIeSw.

the logical relationship between the HGX module and the server head

2022: Hopper Architecture with H100

The H100 GPU, based on the Hopper architecture, was released in 2022 with NVLink and NVSwitch versions 4.0 and 3.0, respectively. While the per-link unidirectional bandwidth remained unchanged at 25GB/s, the number of links increased to 18, resulting in a total bidirectional interconnect bandwidth of 900GB/s. Each GPU is interconnected with 4 NVSwitches using a 5+4+4+5 grouping.

Hopper Architecture with H100

The OSFP interfaces of the NVSwitch chips in the DGX system are used for Nvidia’s larger GPU network, such as in the DGX H100 256 SuperPOD solution.

DGX H100 256 SuperPOD

2024: Blackwell Architecture with B200

In 2024, Nvidia introduced the Blackwell architecture with the B200 GPU, featuring NVLink and NVSwitch versions 5.0 and 4.0, respectively. The per-link unidirectional bandwidth doubled to 50GB/s, with 18 links, resulting in a total bidirectional interconnect bandwidth of 1.8TB/s. Each NVSwitch chip has 72 NVLink 5.0 ports, and each GPU uses 9 NVLink connections to two NVSwitch chips.

Blackwell Architecture with B200

With the B200 release, Nvidia also introduced the NVL72, an integrated GPU system that utilizes the NVLink network Switch to achieve full interconnectivity among 72 GPUs.

The logical topology for interconnecting the 72 GPUs using 9 NVLink Switches is as follows:

72 GPUs using 9 NVLink Switches

Each B200 GPU has 18 NVLink ports, resulting in a total of 1,296 NVLink connections (72×18). A single Switch Tray contains two NVLink Switch chips, each providing 72 interfaces (144 total). Thus, 9 Switch Trays are required to interconnect the 72 GPUs fully.

Leave a Comment

Scroll to Top