AI Computing Hardware: ConnectX-8 SuperNIC

February 14, 2025

FiberMall

One-stop supplier of professional optical communication products

Table of Contents

Product Overview

The ConnectX-8 SuperNIC is NVIDIA’s seventh-generation smart network interface card designed for next-generation AI computing clusters, large-scale data centers, and high-performance computing (HPC) scenarios. It deeply integrates network acceleration and computational offloading capabilities, providing ultra-high-speed support for 400GbE/800GbE. Through hardware-level protocol offloading and GPU-NIC co-optimization, it significantly reduces network latency and enhances throughput efficiency, offering ultra-low latency and lossless network transmission capabilities for AI training, inference, and distributed storage scenarios.

Software Protocols and Acceleration Functions

ConnectX-8 SuperNIC optimizes full-stack network performance through the deep collaboration of the software protocol stack and hardware acceleration engine:

Protocol Support

RDMA/RoCEv2: Based on Converged Ethernet for Remote Direct Memory Access, achieving zero-copy data transfer with latency as low as sub-microseconds.
GPUDirect Technology: Supports GPUDirect RDMA and GPUDirect Storage, enabling direct GPU-to-storage/NIC data interaction, bypassing the CPU.
NVIDIA SHARPv3: Aggregated communication hardware acceleration supporting AllReduce, Broadcast, and other operations to enhance AI training efficiency.
TLS/IPsec Hardware Offload: Supports full traffic encryption and decryption without performance loss.

Software Ecosystem

DOCA 2.0 (Data Center Infrastructure-on-a-Chip Architecture): Provides an API-driven development framework supporting user-defined data plane acceleration functions (e.g., DPU collaborative orchestration).
Deep Integration with the CUDA Ecosystem: Optimizes multi-GPU cross-node communication efficiency through the NCCL library.

Software Protocols and Acceleration Functions

Hardware Architecture and Connectivity Design

Host Interface

PCIe 5.0 x16, theoretical bandwidth of 128GB/s, fully unleashing 400G/800G network performance.

Network Interface

Supports single-port 800GbE OSFP112 or dual-port 400GbE QSFP112 flexible configurations.

Backward compatible with 200GbE/100GbE speeds, adapting to existing infrastructure.

On-Chip Acceleration Engine

Integrates dedicated ASICs supporting flow table management, congestion control (DCQCN), packet verification, and other full hardware offloads.

Networking Architecture and Connectivity

ConnectX-8 SuperNIC supports multi-layer CLOS architecture networking, building high-bandwidth, non-blocking AI computing clusters

Single Node Connection

Each server deploys 1-2 ConnectX-8 NICs, interconnected with the host through PCIe 5.0.

Each port connects directly to the leaf switch via QSFP-DD optical fiber, forming dual uplink redundancy.

Cluster Networking

Leaf Switch: NVIDIA Quantum-3 series (800G) or Spectrum-4 series (400G), supporting RoCEv2 and adaptive routing.
Spine Switch: Fully interconnected with leaf switches through 800G high-speed ports, providing non-blocking bandwidth.
Spine-Leaf Architecture
GPU Direct Networking: Multi-node GPUs achieve cross-node memory direct access via RDMA, forming a distributed training cluster.