2014: Introduction of Pascal Architecture with Tesla P100
In 2014, Nvidia launched the Tesla P100 based on the Pascal architecture. This GPU featured the first-generation NVLink technology, enabling high-speed communication between 4 or 8 GPUs. The NVLink 1.0’s bidirectional interconnect bandwidth was five times that of PCIe 3.0×16. Here’s the calculation:
- PCIe 3.0×16: Bidirectional communication bandwidth of 32GB/s (1GBx16x2).
- NVLink 1.0: Bidirectional interconnect bandwidth of 160GB/s (20GBx4x2).
Due to the absence of NvSwitch chips, the GPUs were interconnected in a mesh topology, where 160GB/s represents the total bandwidth from one GPU to four directly connected GPUs.
![Pascal Architecture with Tesla P100](https://www.fibermall.com/blog/wp-content/uploads/2025/02/Pascal-Architecture-with-Tesla-P100-1024x601.png)
2017: Volta Architecture with V100
In 2017, Nvidia released the Volta architecture with the V100 GPU. The V100’s NVLink increased the per-link unidirectional bandwidth from 20GB/s to 25GB/s and the number of links from 4 to 6, raising the total supported GPU NVLink bandwidth to 300GB/s. However, the V100 DGX-1 system released in 2017 did not feature NvSwitch. The topology was similar to NVLink 1.0, with an increase in the number of links.
![Volta Architecture with V100](https://www.fibermall.com/blog/wp-content/uploads/2025/02/Volta-Architecture-with-V100-1-1024x729.png)
2018: Introduction of V100 DGX-2 System
To further enhance inter-GPU communication bandwidth and overall system performance, Nvidia introduced the V100 DGX-2 system in 2018. This was the first system to incorporate the NvSwitch chip, enabling full interconnectivity among 16 SXM V100 GPUs within a single DGX-2 system.
![V100 DGX-2 System](https://www.fibermall.com/blog/wp-content/uploads/2025/02/V100-DGX-2-System.png)
The NVSwitch has 18 NVLink ports, 8 connecting to the GPU and 8 to another NVSwitch chip on a different baseboard. Each baseboard contains six NVSwitches for communication with another baseboard.
![Each baseboard contains six NVSwitches for communication with another baseboard.](https://www.fibermall.com/blog/wp-content/uploads/2025/02/Each-baseboard-contains-six-NVSwitches-for-communication-with-another-baseboard-1024x676.png)
2020: Ampere Architecture with A100
In 2020, Nvidia launched the Ampere architecture with the A100 GPU. The NVLink and NVSwitch chips were upgraded to versions 3.0 and 2.0, respectively. Although the per-link unidirectional bandwidth remained at 25GB/s, the number of links increased to 12, resulting in a total bidirectional interconnect bandwidth of 600GB/s. The DGX A100 system features 6 NVSwitch 2.0 chips, with each A100 GPU interconnected via 12 NVLink connections to the 6 NVSwitch chips, ensuring two links to each NVSwitch.
The logical topology of the GPU system is as follows:
![logical topology of the GPU system](https://www.fibermall.com/blog/wp-content/uploads/2025/02/logical-topology-of-the-GPU-system-722x1024.png)
Many people are unclear about the logical relationship between the HGX module and the “server head.” Below is a diagram showing that the SXM GPU baseboard is interconnected with the server motherboard through PCIe links. The PCIe switch (PCIeSw) chip is integrated into the server head motherboard. Both the network card and NVMe U.2 PCIe signals also originate from the PCIeSw.
![the logical relationship between the HGX module and the server head](https://www.fibermall.com/blog/wp-content/uploads/2025/02/the-logical-relationship-between-the-HGX-module-and-the-server-head-1024x579.png)
2022: Hopper Architecture with H100
The H100 GPU, based on the Hopper architecture, was released in 2022 with NVLink and NVSwitch versions 4.0 and 3.0, respectively. While the per-link unidirectional bandwidth remained unchanged at 25GB/s, the number of links increased to 18, resulting in a total bidirectional interconnect bandwidth of 900GB/s. Each GPU is interconnected with 4 NVSwitches using a 5+4+4+5 grouping.
![Hopper Architecture with H100](https://www.fibermall.com/blog/wp-content/uploads/2025/02/Hopper-Architecture-with-H100.png)
The OSFP interfaces of the NVSwitch chips in the DGX system are used for Nvidia’s larger GPU network, such as in the DGX H100 256 SuperPOD solution.
![DGX H100 256 SuperPOD](https://www.fibermall.com/blog/wp-content/uploads/2025/02/DGX-H100-256-SuperPOD.png)
2024: Blackwell Architecture with B200
In 2024, Nvidia introduced the Blackwell architecture with the B200 GPU, featuring NVLink and NVSwitch versions 5.0 and 4.0, respectively. The per-link unidirectional bandwidth doubled to 50GB/s, with 18 links, resulting in a total bidirectional interconnect bandwidth of 1.8TB/s. Each NVSwitch chip has 72 NVLink 5.0 ports, and each GPU uses 9 NVLink connections to two NVSwitch chips.
![Blackwell Architecture with B200](https://www.fibermall.com/blog/wp-content/uploads/2025/02/Blackwell-Architecture-with-B200-1024x532.png)
With the B200 release, Nvidia also introduced the NVL72, an integrated GPU system that utilizes the NVLink network Switch to achieve full interconnectivity among 72 GPUs.
The logical topology for interconnecting the 72 GPUs using 9 NVLink Switches is as follows:
![72 GPUs using 9 NVLink Switches](https://www.fibermall.com/blog/wp-content/uploads/2025/02/72-GPUs-using-9-NVLink-Switches.png)
Each B200 GPU has 18 NVLink ports, resulting in a total of 1,296 NVLink connections (72×18). A single Switch Tray contains two NVLink Switch chips, each providing 72 interfaces (144 total). Thus, 9 Switch Trays are required to interconnect the 72 GPUs fully.
Related Products:
-
NVIDIA MMA4Z00-NS400 Compatible 400G OSFP SR4 Flat Top PAM4 850nm 30m on OM3/50m on OM4 MTP/MPO-12 Multimode FEC Optical Transceiver Module $650.00
-
NVIDIA MMA4Z00-NS-FLT Compatible 800Gb/s Twin-port OSFP 2x400G SR8 PAM4 850nm 100m DOM Dual MPO-12 MMF Optical Transceiver Module $850.00
-
NVIDIA MMA4Z00-NS Compatible 800Gb/s Twin-port OSFP 2x400G SR8 PAM4 850nm 100m DOM Dual MPO-12 MMF Optical Transceiver Module $750.00
-
NVIDIA MMS4X00-NM Compatible 800Gb/s Twin-port OSFP 2x400G PAM4 1310nm 500m DOM Dual MTP/MPO-12 SMF Optical Transceiver Module $1100.00
-
NVIDIA MMS4X00-NM-FLT Compatible 800G Twin-port OSFP 2x400G Flat Top PAM4 1310nm 500m DOM Dual MTP/MPO-12 SMF Optical Transceiver Module $1200.00
-
NVIDIA MMS4X00-NS400 Compatible 400G OSFP DR4 Flat Top PAM4 1310nm MTP/MPO-12 500m SMF FEC Optical Transceiver Module $800.00
-
Mellanox MMA1T00-HS Compatible 200G Infiniband HDR QSFP56 SR4 850nm 100m MPO-12 APC OM3/OM4 FEC PAM4 Optical Transceiver Module $200.00
-
NVIDIA MFP7E10-N010 Compatible 10m (33ft) 8 Fibers Low Insertion Loss Female to Female MPO Trunk Cable Polarity B APC to APC LSZH Multimode OM3 50/125 $47.00
-
NVIDIA MCP7Y00-N003-FLT Compatible 3m (10ft) 800G Twin-port OSFP to 2x400G Flat Top OSFP InfiniBand NDR Breakout DAC $275.00
-
NVIDIA MCP7Y70-H002 Compatible 2m (7ft) 400G Twin-port 2x200G OSFP to 4x100G QSFP56 Passive Breakout Direct Attach Copper Cable $155.00
-
NVIDIA MCA4J80-N003-FTF Compatible 3m (10ft) 800G Twin-port 2x400G OSFP to 2x400G OSFP InfiniBand NDR Active Copper Cable, Flat top on one end and Finned top on other $600.00
-
NVIDIA MCP7Y10-N002 Compatible 2m (7ft) 800G InfiniBand NDR Twin-port OSFP to 2x400G QSFP112 Breakout DAC $200.00