GDDR Memory vs HBM Memory

What is GDDR Memory?

GDDR, Graphics Double Data Rate, is a type of memory designed specifically for graphics cards. GDDR memory is similar to the DDR memory used in most computers, but it is optimized for graphics cards. GDDR memory generally has a higher bandwidth than DDR memory, which means it can transfer more data at once.

GDDR6 is the latest memory standard for GPUs, with a peak per-pin data rate of 16Gb/s. GDDR6 is used in most GPUs including the NVIDIA RTX 6000 Ada and AMD Radeon PRO W7900, still used in GPUs in 2024.

NVIDIA is also working with Micron on GDDR6X, the successor to GDDR6. We say this because aside from the encoding from NRZ to PAM4, there are no hardware changes between the two, and since NVIDIA is the only user, there is no endorsement from the JEDEC industry standardization. DDR6X increases the per-pin bandwidth to 21Gb/s. GDDR7 is the next GDDR standard that should be widely adopted by everyone.

As of 2024, the maximum memory bus for GDDR6 and GDDR6X is 384 bits. GDDR memory is a single chip soldered to the PCB surrounding the GPU chip.

GDDR Memory

What is HBM Memory?

HBM stands for High Bandwidth Memory, a new type of memory developed specifically for GPUs.

HBM memory is designed to provide a larger memory bus width than GDDR memory, which means it can transfer more data at once. A single HBM memory chip isn’t as fast as a single GDDR6 chip, but this makes it more energy efficient than GDDR memory, which is an important consideration for mobile devices.

HBM memory is located inside the GPU package and is stacked – for example, HBM has a stack of four DRAMs (4-Hi), each with two 128-bit channels and a total width of 1024 bits (4 * 2 channels * 128 bits). Since HBM memory is built into the GPU chip as a memory chip module, there are fewer errors and space. Therefore, a single GPU cannot easily scale its memory configuration as easily as a GPU equipped with GDDR.

The latest and most adopted HBM memory is HBM3 in NVIDIA H100, with a 5120-bit bus and over 2TB/s memory bandwidth. HBM3 is also present in its competitor’s AMD Instinct MI300X, with an 8192-bit bus and over 5.3TB/s of memory bandwidth. Nvidia also introduced new HBM3e memory in its GH200 and H200 as the first accelerators and processors to use HBM3e, which has greater memory bandwidth. These hardware equipped with HBM memory are being refurbished at a rapid pace. One important reason why accelerator GPUs such as H100 and MI300X need HBM is the interconnectivity between multiple GPUs; To communicate with each other, wide bus width and fast data transfer rates are critical to reduce the constraint of transferring data from one GPU to another.

HBM memory

GDDR vs HBM

Which type of memory is better for GPU? The answer is that it depends on the specific scenario.

GPUs equipped with GDDR memory are typically:

  • More accessible because they are mainstream GPU types
  • Cheaper, because the GDDR is soldered directly onto the PCB, not onto the GPU package.
  • Most mainstream applications do not maximize memory bandwidth, But GDDR generally consumes more energy and is not as efficient.

GPUs equipped with HBM memory are typically:

  • Less accessible, more niche
  • Very expensive, found in flagship accelerators like the H100.
  • Only for HPC and high-niche workloads that require the most bandwidth
  • Efficient and provides larger bus widths to parallelize per-pin rates.

Most applications do not require HBM memory. For workloads that utilize large amounts of data, higher memory bandwidth is of utmost importance. Workloads such as simulations, real-time analysis, intensive AI training, complex AI inference, and more can all benefit from using more memory bandwidth.

It’s also important to consider that the fastest GPU equipped with GDDR will work just fine if the workloads are parallel to each other. NVIDIA RTX 6000 Ada is a powerful flagship GPU ideal for small to medium-sized AI training, rendering, analytics, simulation, and data-intensive workloads, with a memory bandwidth of 960GB/s. Its sockets are servers or workstations with multi-GPU setups, where work can be parallelized and split for even higher performance.

However, HBM-equipped GPUs like the NVIDIA H100 can significantly improve productivity in enterprise deployments (albeit at a high cost). Higher performance and less waiting lead to faster breakthroughs. Deployments such as ChatGPT leverage clusters of H100s working together to perform real-time inference and generate AI functions for millions of users at a given time, processing prompts and delivering real-time outputs.

Without fast high-bandwidth memory and peak performance, enterprise deployments can become too slow to be used. A good example of this is the launch of ChatGPT. ChatGPT and OpenAI probably think they have enough HBM-enabled NVIDIA GPUs to handle a large number of concurrent users, but have no idea how popular their new generative AI chatbot will be. They had to put a cap on the number of concurrent users, asking site visitors to be patient with the service while they scaled up their infrastructure. However, from this perspective, ChatGPT might not be feasible without GPUs using these high-bandwidth memory interconnects.

Conclusion

In summary, both GDDR memory and HBM memory have their pros and cons. GDDR memory is cheaper and is a good choice for applications that require high bandwidth but don’t need the absolute highest performance. On the other hand, HBM memory is more expensive but offers higher bandwidth, making it a good choice for applications that require high performance. When choosing between these two types of memory, it is important to consider scenario and cost.

Leave a Comment

Scroll to Top