The AI inference race is heating up, with competitors closing in on NVIDIA’s dominant position in the AI training market, especially in terms of energy efficiency. However, NVIDIA’s newly released Blackwell chip might be a tough act to follow. The latest round of results from the MLPerf Inference v4.1 competition, released by ML Commons, highlights the performance of various accelerators, including AMD Instinct, Google’s Trillium, UntetherAI’s chip, and NVIDIA’s Blackwell.
The Battle for Dominance
While NVIDIA’s GPUs continue to lead in the AI training domain, the competition is catching up in the AI inference segment. The MLPerf competition serves as a platform where technology giants and startups alike showcase their advancements in AI hardware and software. This round of the competition saw participants vying for top spots in various categories, much like the Olympics, with multiple subcategories and benchmarks.
The MLPerf Benchmark
MLPerf offers a comprehensive set of benchmarks that cover a wide range of AI tasks, from popular applications like image generation (Midjourney) and LLM问答 (ChatGPT) to less glamorous but equally critical tasks such as image classification, object detection, and recommendation engines. This round introduced a new benchmark called Mixture of Experts, reflecting an increasingly popular trend in LLM deployment. This approach involves breaking down a large language model into several smaller, specialized models, each fine-tuned for specific tasks. This method reduces resource usage per query, cutting costs and increasing throughput.
NVIDIA’s Blackwell Chip Stands Out
In the highly anticipated Closed Data Center benchmark, NVIDIA’s H200 GPU and GH200 superchip (combining GPU and CPU) emerged as winners. However, a deeper analysis of the performance data reveals a more nuanced picture. Some participants deployed numerous accelerators, while others used only one. When standardizing the number of queries processed per second per accelerator, interesting details emerged, though this method did not account for the impact of CPUs and interconnects on performance.
In the LLM问答 task, NVIDIA’s Blackwell chip, participating in its sole benchmark, outperformed all previous chips by 2.5 times. Untether AI’s speedAI240 preview chip almost matched H200’s performance in the image recognition task. Google’s Trillium achieved half the performance of H100 and H200 in image generation, while AMD’s Instinct was roughly equivalent to H100 in LLM问答 tasks.
Blackwell’s Key Advantages
A key factor behind the Blackwell chip’s success is its ability to run LLMs using 4-bit floating-point precision. NVIDIA and its competitors have been working to reduce the number of bits used to represent data, aiming to enhance computational speed. The Blackwell chip, which introduced 4-bit precision for the first time in a benchmark test, faced the challenge of maintaining model accuracy with lower precision. NVIDIA’s product marketing director, Dave Salvator, highlighted the significant software innovations required to meet MLPerf’s high precision standards.
Another crucial factor in the Blackwell chip’s success is its substantial memory bandwidth increase, reaching 8 terabytes per second, nearly double the bandwidth of the H200 chip. NVIDIA’s GB2800 Grace Blackwell superchip is designed for connectivity and scalability, with support for up to 18 NVLink connections, each with a bandwidth of 100 gigabytes per second, totaling 1.8 terabytes per second, roughly double the interconnect bandwidth of H100.
Salvator emphasized that as large language models continue to expand, inference tasks will also require multi-GPU platforms to meet growing demands. Blackwell is not just a chip; it’s a platform, he stated. NVIDIA’s Blackwell-based system participated in MLPerf’s preview subcategory, indicating that the chip is not yet available for sale but is expected to hit the market within the next six months, ahead of the next MLPerf evaluation.
Untether AI’s Edge in Power Efficiency
Untether AI demonstrated impressive performance in terms of power efficiency and edge computing. The company’s focus on optimizing energy consumption and reducing latency positions it as a strong contender in the rapidly evolving AI inference landscape.
Conclusion
The MLPerf Inference v4.1 competition has showcased the intense battle for dominance in the AI inference market. While NVIDIA’s Blackwell chip has set a new benchmark, competitors like Untether AI are making significant strides. As the demand for AI inference continues to grow, the race to develop more efficient and powerful accelerators is far from over.
Views: 0