Detailed Analysis of NVIDIA GH200 Chip, Servers, and Cluster Networking

Traditional OEM GPU Servers: Intel/AMD x86 CPU + NVIDIA GPU

Before 2024, both NVIDIA’s own servers and third-party servers equipped with NVIDIA GPUs were based on x86 CPU machines. The GPUs were connected to the motherboard via PCIe cards or 8-card modules.

typical 8xA100 GPU Node
Typical 8-Card A100 Host Hardware Topology

At this stage, the CPU and GPU were independent. Server manufacturers could assemble their servers by purchasing GPU modules (e.g., 8*A100). The choice of Intel or AMD CPUs depended on performance, cost, or cost-effectiveness considerations.

Next-Generation OEM GPU Servers: NVIDIA CPU + NVIDIA GPU

With the advent of the NVIDIA GH200 chip in 2024, NVIDIA’s GPUs began to include integrated CPUs.

  • Desktop Computing Era: The CPU was primary, with the GPU (graphics card) as a secondary component. The CPU chip could integrate a GPU chip, known as an integrated graphics card.
  • AI Data Center Era: The GPU has taken the primary role, with the CPU becoming secondary. The GPU chip/card now integrates the CPU.

As a result, NVIDIA’s integration level has increased, and they have started offering complete machines or full racks.

CPU Chip: Grace (ARM) is designed based on ARMv9 architecture.

GPU Chip: Hopper/Blackwell/…

For example, the Hopper series initially released the H100-80GB, followed by further iterations:

  • H800: A cut-down version of the H100.
  • H200: An upgraded version of the H100.
  • H20: A cut-down version of the H200, significantly inferior to the H800.

Chip Product (Naming) Examples

Grace CPU + Hopper 200 (H200) GPU

GH200 on a single board:

GH200 on a single board
NVIDIA GH200 Chip (Board) Rendering: Left: Grace CPU chip; Right: Hopper GPU chip.

Grace CPU + Blackwell 200 (B200) GPU

GB200 on a single board (module), with high power consumption and integrated liquid cooling:

with high power consumption and integrated liquid cooling
NVIDIA GB200 Rendering: A module including 2 Grace CPUs + 4 B200 GPUs, with an integrated liquid cooling module.

72 B200s form an OEM cabinet NVL72:

72 B200s form an OEM cabinet NVL72
NVIDIA GB200 NVL72 Cabinet

Internal Design of GH200 Servers

GH200 Chip Logical Diagram

Integration of CPU, GPU, RAM, and VRAM into a Single Chip

The logical diagram of a single NVIDIA GH200 chip
The logical diagram of a single NVIDIA GH200 chip

Core Hardware

As illustrated in the diagram, a single GH200 superchip integrates the following core components:

  • One NVIDIA Grace CPU
  • One NVIDIA H200 GPU
  • Up to 480GB of CPU memory
  • 96GB or 144GB of GPU VRAM

Chip Hardware Interconnects

The CPU connects to the motherboard via four PCIe Gen5 x16 lanes:

  • Each PCIe Gen5 x16 lane offers a bidirectional speed of 128GB/s
  • Therefore, the total speed for four lanes is 512GB/s

The CPU and GPU are interconnected using NVLink® Chip-2-Chip (NVLink-C2C) technology:

  • 900GB/s, which is seven times faster than PCIe Gen5 x16

GPU interconnects (within the same host and across hosts) use 18x NVLINK4:

  • 900GB/s

NVLink-C2C provides what NVIDIA refers to as “memory coherency,” ensuring consistency between memory and VRAM. The benefits include:

  • Unified memory and VRAM up to 624GB, allowing users to utilize it without distinction, thereby enhancing developer efficiency
  • Concurrent and transparent access to CPU and GPU memory by both the CPU and GPU
  • GPU VRAM can be oversubscribed, using CPU memory when needed, thanks to the large interconnect bandwidth and low latency

Next, let’s delve into the hardware components such as the CPU, memory, and GPU.

CPU and Memory

72-core ARMv9 CPU

The 72-core Grace CPU is based on the Neoverse V2 Armv9 core architecture.

480GB LPDDR5X (Low-Power DDR) Memory

  • Supports up to 480GB of LPDDR5X memory
  • 500GB/s per CPU memory bandwidth

To understand this speed in the context of storage:

Supports up to 480GB of LPDDR5X memory

Comparison of Three Types of Memory: DDR vs. LPDDR vs. HBM

Most servers (the vast majority) use DDR memory, connected to the CPU via DIMM slots on the motherboard. The first to fourth generations of LPDDR correspond to the low-power versions of DDR1 to DDR4, commonly used in mobile devices.

  • LPDDR5 is designed independently of DDR5 and was even produced earlier than DDR5
  • It is directly soldered to the CPU, non-removable, and non-expandable, which increases cost but offers faster speeds
  • A similar type is GDDR, used in GPUs like the RTX 4090

GPU and VRAM

H200 GPU Computing Power

Details on the computing power of the H200 GPU are provided below.

VRAM Options

Two types of VRAM are supported, with a choice between:

  • 96GB HBM3
  • 144GB HBM3e, offering 4.9TB/s bandwidth, which is 50% higher than the H100 SXM.

This variant places two GH200 chips on a single board, doubling the CPU, GPU, RAM, and VRAM, with full interconnection between the two chips. For example, in a server that can accommodate 8 boards:

  • Using GH200 chips: The number of CPUs and GPUs is 8 * {72 Grace CPUs, 1 H200 GPU}
  • Using the GH200 NVL2 variant: The number of CPUs and GPUs is 8 * {144 Grace CPUs, 2 H200 GPUs}

GH200 & GH200 NVL2 Product Specifications (Computing Power)

GH200 & GH200 NVL2 Product

The product specifications for NVIDIA GH200 are provided. The upper section includes CPU, memory, and other parameters, while the GPU parameters start from “FP64.”

GH200 Servers and Networking

There are two server specifications, corresponding to PCIe cards and NVLINK cards.

NVIDIA MGX with GH200: OEM Host and Networking

The diagram below illustrates a networking method for a single-card node:

NVIDIA MGX with GH200
  • Each node contains only one GH200 chip, functioning as a PCIe card without NVLINK.
  • Each node’s network card or accelerator card (BlueField-3 (BF3) DPUs) connects to a switch.
  • There is no direct connection between GPUs across nodes; communication is achieved through the host network (GPU -> CPU -> NIC).
  • Suitable for HPC workloads and small to medium-scale AI workloads.

NVIDIA GH200 NVL32: OEM 32-Card Cabinet

The 32-card cabinet connects 32 GH200 chips into a single logical GPU module using NVLINK, hence the name NVL32.

NVIDIA GH200 NVL32

The NVL32 module is essentially a cabinet:

  • A single cabinet provides 19.5TB of memory and VRAM.
  • NVLink TLB allows any GPU to access any memory/VRAM within the cabinet.
The NVL32 module is essentially a cabinet

There are three types of memory/VRAM access methods in the NVIDIA GH200 NVL32, including Extended GPU Memory (EGM).

Multiple cabinets can be interconnected through a network to form a cluster, suitable for large-scale AI workloads.

Leave a Comment

Scroll to Top