NVIDIA HGX B200 and Thoughts on Its Liquid Cooling Solution

The NVIDIA HGX B200 is NVIDIA’s latest high-performance computing platform, based on the Blackwell GPU architecture. It integrates several advanced technologies and components designed to deliver exceptional computing performance and energy efficiency.

HGX B200 air cooled

The complete system height with the HGX B200 air-cooled module reaches 10U, with the HGX B200 air-cooled module itself accounting for approximately 6U.

Exxact TensorEX 10U HGX B200 Server

Exxact TensorEX 10U HGX B200 Server

6x 5250W Redundant (3 + 3) power supplies

superserver

SuperServer SYS-A22GA-NBRT(10U)6x 5250W Redundant (3 + 3) power supplies

At the OCP Global Summit 2024, several new photographs of the NVIDIA HGX B200 were showcased. Compared to the NVIDIA HGX A100/H100/H200, a significant change is the relocation of the NVLink Switch chip to the center of the component, rather than at one side. This change minimizes the maximum link distance between the GPUs and the NVLink Switch chip. The NVLink Switch now consists of only two chips, compared to four in the previous generation, and their size has notably increased.

Near the edge connectors, a PCIe Retimer has replaced the NVSwitch. These Retimers typically use smaller heatsinks as their TDP (Thermal Design Power) is around 10-15W.

HGX B200 mainbroad without heatsink

HGX B200 Motherboard Without Heatsinks – 1

HGX B200 mainbroad without heatsink-2

HGX B200 Motherboard Without Heatsinks – 2

Retimer

HGX B200 Motherboard Retimer Chip Heatsink

The silkscreen on the top surface of the EXAMAX connector indicates that this is an Umbriel GB200 SXM6 8 GPU baseboard, with the part number: 675-26287-00A0-TS53. Close inspection reveals that the Retimer chip manufacturer is Astera Labs.

B200 Part Number

NVIDIA HGX B200 Part Number Information

NVIDIA HGX B200 Astera Labs Retimer Chip Close-Up

NVIDIA HGX B200 Astera Labs Retimer Chip Close-Up

The perimeter of the HGX B200 motherboard is encased in a black aluminum alloy mounting frame used to secure heatsinks and attach thermal materials.

NVIDIA HGX B200 Motherboard Heatsink Mounting Frame

NVIDIA HGX B200 Motherboard Heatsink Mounting Frame

Below are images of the NVLink Switch chip showcased at the 2024 OCP Global Summit.

NVIDIA HGX B200 NVLink Switch Chip Close-Up

Considerations for the Liquid Cooling Solution for HGX B200

NVIDIA has established two TDP (Thermal Design Power) values for the B200: 1200W for liquid cooling and 1000W for air cooling. Additionally, the B100 offers a 700W range similar to the previous H100 SXM, allowing OEM manufacturers to reuse the 700W air cooling design. Higher TDP limits correlate with increased clock frequencies and the number of enabled arithmetic units, thereby enhancing performance. In fact, FP4 (Tensor Core) performance for the B200/1200W is 20 PFLOPS, for the B200/1000W is 18 PFLOPS, and for the B100/700W is 14 PFLOPS.

The OAI system employs a 4×2 cold plate (i.e., water pipe) loop, with cold liquid initially flowing into the cold plates over OAM 1-4, absorbing heat and then warming up slightly before passing through the cold plates over OAM 5-8. This resembles air cooling, where airflow sequentially passes through the heatsinks of two CPUs.

In contrast, an 8×1 cold plate loop layout distributes cold liquid evenly to all 8 OAMs, avoiding higher temperatures in half of the OAMs but potentially incurring higher costs due to additional piping.

OAM 1.5

In the OAM 1.5 specification, the cold plate assembly is illustrated in a 4-parallel-2-series arrangement.

4-parallel-2-series

4-parallel-2-series versus 8×1 Configuration

nvidia h100 cold plate
H3C R5500 G6 H100 Module

H3C R5500 G6 H100 Module Liquid Cooling 4-parallel-3-series (2 GPUs in Parallel + 1 Switch in Series)

H100 8+4 (4-parallel-3-series Configuration)

Based on the above H100 cold plate configurations, the considerations for the B200 liquid cooling solution are as follows: The 8 GPUs and 2 Switches are divided into 2 groups. Each group consists of 4 GPUs and 1 Switch. Both groups use the same liquid cooling scheme. Each group has 2 inlet and 2 outlet ports for the cold plates. The top 2 GPUs are in parallel and connected in series with the Switch, and the bottom 2 GPUs are also in parallel and connected in series with the same Switch, resulting in 4 inlet/outlet ports on the Switch cold plate.

Alternatively, the manifold can be designed with 6 inlets and 6 outlets, where 4 of the inlets and outlets are used for the 8 GPUs (4-parallel-2-series configuration), and the other 2 inlets and 2 outlets are for the 2 Switches, each connected to the manifold. This approach requires careful consideration of the routing path and space constraints for the piping. However, regardless of the chosen solution, detailed simulation evaluation and practical system design are necessary.

Leave a Comment

Scroll to Top