Detailed Analysis of NVIDIA GH200 Chip, Servers, and Cluster Networking

September 4, 2024

Casey

Expert in access network, PON, GPON, etc.

Table of Contents

Traditional OEM GPU Servers: Intel/AMD x86 CPU + NVIDIA GPU

Before 2024, both NVIDIA’s own servers and third-party servers equipped with NVIDIA GPUs were based on x86 CPU machines. The GPUs were connected to the motherboard via PCIe cards or 8-card modules.

typical 8xA100 GPU Node — Typical 8-Card A100 Host Hardware Topology

At this stage, the CPU and GPU were independent. Server manufacturers could assemble their servers by purchasing GPU modules (e.g., 8*A100). The choice of Intel or AMD CPUs depended on performance, cost, or cost-effectiveness considerations.

Next-Generation OEM GPU Servers: NVIDIA CPU + NVIDIA GPU

With the advent of the NVIDIA GH200 chip in 2024, NVIDIA’s GPUs began to include integrated CPUs.

Desktop Computing Era: The CPU was primary, with the GPU (graphics card) as a secondary component. The CPU chip could integrate a GPU chip, known as an integrated graphics card.
AI Data Center Era: The GPU has taken the primary role, with the CPU becoming secondary. The GPU chip/card now integrates the CPU.

As a result, NVIDIA’s integration level has increased, and they have started offering complete machines or full racks.

CPU Chip: Grace (ARM) is designed based on ARMv9 architecture.

GPU Chip: Hopper/Blackwell/…

For example, the Hopper series initially released the H100-80GB, followed by further iterations:

H800: A cut-down version of the H100.
H200: An upgraded version of the H100.
H20: A cut-down version of the H200, significantly inferior to the H800.

Chip Product (Naming) Examples

Grace CPU + Hopper 200 (H200) GPU

GH200 on a single board:

Grace CPU + Blackwell 200 (B200) GPU

GB200 on a single board (module), with high power consumption and integrated liquid cooling:

72 B200s form an OEM cabinet NVL72:

Internal Design of GH200 Servers

GH200 Chip Logical Diagram

Integration of CPU, GPU, RAM, and VRAM into a Single Chip

The logical diagram of a single NVIDIA GH200 chip

Core Hardware

As illustrated in the diagram, a single GH200 superchip integrates the following core components:

One NVIDIA Grace CPU
One NVIDIA H200 GPU
Up to 480GB of CPU memory
96GB or 144GB of GPU VRAM

Chip Hardware Interconnects

The CPU connects to the motherboard via four PCIe Gen5 x16 lanes:

Each PCIe Gen5 x16 lane offers a bidirectional speed of 128GB/s
Therefore, the total speed for four lanes is 512GB/s

The CPU and GPU are interconnected using NVLink® Chip-2-Chip (NVLink-C2C) technology:

900GB/s, which is seven times faster than PCIe Gen5 x16

GPU interconnects (within the same host and across hosts) use 18x NVLINK4:

900GB/s

NVLink-C2C provides what NVIDIA refers to as “memory coherency,” ensuring consistency between memory and VRAM. The benefits include:

Unified memory and VRAM up to 624GB, allowing users to utilize it without distinction, thereby enhancing developer efficiency
Concurrent and transparent access to CPU and GPU memory by both the CPU and GPU
GPU VRAM can be oversubscribed, using CPU memory when needed, thanks to the large interconnect bandwidth and low latency

Next, let’s delve into the hardware components such as the CPU, memory, and GPU.

CPU and Memory

72-core ARMv9 CPU

The 72-core Grace CPU is based on the Neoverse V2 Armv9 core architecture.

480GB LPDDR5X (Low-Power DDR) Memory

Supports up to 480GB of LPDDR5X memory
500GB/s per CPU memory bandwidth

To understand this speed in the context of storage:

Comparison of Three Types of Memory: DDR vs. LPDDR vs. HBM

Most servers (the vast majority) use DDR memory, connected to the CPU via DIMM slots on the motherboard. The first to fourth generations of LPDDR correspond to the low-power versions of DDR1 to DDR4, commonly used in mobile devices.

LPDDR5 is designed independently of DDR5 and was even produced earlier than DDR5
It is directly soldered to the CPU, non-removable, and non-expandable, which increases cost but offers faster speeds
A similar type is GDDR, used in GPUs like the RTX 4090

GPU and VRAM

H200 GPU Computing Power

Details on the computing power of the H200 GPU are provided below.

VRAM Options

Two types of VRAM are supported, with a choice between:

96GB HBM3
144GB HBM3e, offering 4.9TB/s bandwidth, which is 50% higher than the H100 SXM.

Variant: GH200 NVL2 with Full NVLINK Connection

This variant places two GH200 chips on a single board, doubling the CPU, GPU, RAM, and VRAM, with full interconnection between the two chips. For example, in a server that can accommodate 8 boards:

Using GH200 chips: The number of CPUs and GPUs is 8 * {72 Grace CPUs, 1 H200 GPU}
Using the GH200 NVL2 variant: The number of CPUs and GPUs is 8 * {144 Grace CPUs, 2 H200 GPUs}

GH200 & GH200 NVL2 Product Specifications (Computing Power)

The product specifications for NVIDIA GH200 are provided. The upper section includes CPU, memory, and other parameters, while the GPU parameters start from “FP64.”

GH200 Servers and Networking

There are two server specifications, corresponding to PCIe cards and NVLINK cards.

NVIDIA MGX with GH200: OEM Host and Networking

The diagram below illustrates a networking method for a single-card node:

Each node contains only one GH200 chip, functioning as a PCIe card without NVLINK.
Each node’s network card or accelerator card (BlueField-3 (BF3) DPUs) connects to a switch.
There is no direct connection between GPUs across nodes; communication is achieved through the host network (GPU -> CPU -> NIC).
Suitable for HPC workloads and small to medium-scale AI workloads.