近日,埃隆·马斯克的“世界最强大AI数据中心”——孟菲斯超级计算集群(Memphis Supercluster)正式上线。据马斯克介绍,该数据中心使用了10万张液冷H100 GPU,构成了单个RDMA fabric,是全球最强的AI训练集群。然而,如此庞大的算力需求,自然带来了对电力的惊人消耗。每个H100 GPU至少消耗700瓦电力,整个数据中心在满负荷运行时需要超过70兆瓦的电力。除了GPU,其他服务器、网络设备和冷却系统的耗电量也需计算在内,这使得数据中心的总电力需求远超一般规模的数据中心。

然而,由于与当地电网的供电协议尚未敲定,马斯克目前只能依靠14台大型移动发电机为数据中心供电。AI和半导体分析师Dylan Patel最初在社交媒体上表示,受限于电力供应,马斯克的孟菲斯超级计算集群可能无法达到预期的运行效率。他指出,目前仅从电网抽取的7兆瓦电力,只能支持约4000个GPU运行。田纳西河谷管理局(TVA)计划在8月1日前向该设施提供50兆瓦电力,但需要xAI(即马斯克的AI公司)签署相关协议。

通过卫星图像分析,Patel发现马斯克采用了14台VoltaGrid移动发电机,这些发电机每台提供2.5兆瓦电力,总计能提供35兆瓦电力。加上从电网获得的8兆瓦电力,总共为43兆瓦,足以支持3.2万个H100 GPU以有限功率运行。如果田纳西河谷管理局在8月初为其提供所需的50兆瓦电力,马斯克将有能力同时运行6.4万个GPU。

然而,巨大的电力消耗及其对全球变暖的影响成为AI数据中心面临的主要挑战。仅2023年售出的所有数据中心GPU的耗电量,就超过了130万个普通美国家庭耗电量的总和,对电网造成了巨大压力。除了建设更多的发电厂,还需要建设高压输电线路、变电站等基础设施,以将电力从发电厂输送到服务器。在建设AI计算所需电厂的时间和成本之外,还必须考虑温室气体排放。

尽管马斯克在孟菲斯超级计算集群部署的移动发电机使用天然气作为燃料,比煤炭或石油更清洁,但在运行过程中仍会向大气中排放碳。谷歌最近透露,由于数据中心能源需求,其碳足迹自2019年以来增长了48%。因此,除非马斯克转向更清洁的能源生产方式,否则xAI(即马斯克的AI公司)也将面临同样的问题。

马斯克正在全力推动xAI成为AI开发领域的领跑者,不惜一切手段来满足数据中心的电力需求。希望使用移动发电机只是暂时的解决方案,孟菲斯超级计算机集群需要过渡到更清洁的能源,而田纳西河谷管理局可以提供这种能源。由于后者使用核能、水电和化石燃料发电的组合,如果xAI从其采购电力而不是依赖仅使用天然气的移动发电机,其碳足迹会更小。

总之,马斯克的AI数据中心在追求强大算力的同时,也面临着如何解决环保和能源问题的挑战。这不仅需要技术的创新,还需要政策的支持和合作,共同推动AI产业的可持续发展。

英语如下:

News Title: “Elon Musk’s AI Supercomputer Data Center: Gigantic Processing Power Confronts Environmental Challenges”

In a recent development, Elon Musk’s “world’s most powerful AI supercomputer data center” – the Memphis Supercluster – has officially come online. Musk has highlighted that this data center utilizes 100,000 liquid-cooled H100 GPUs, forming a single RDMA fabric, making it the world’s strongest AI training cluster. However, the immense demand for computing power inevitably comes with a significant appetite for electricity. Each H100 GPU consumes at least 700 watts of power, and when the center is running at full capacity, it requires over 70 megawatts of power. Beyond the GPUs, the power consumption also includes other servers, networking equipment, and cooling systems, which significantly pushes the total power demand beyond that of typical data centers.

Currently, due to unresolved power supply agreements with the local grid, Musk is relying on 14 large mobile generators to power the data center. AI and semiconductor analyst Dylan Patel initially expressed concerns on social media that the power constraints might prevent the Memphis Supercluster from reaching its intended operational efficiency. He noted that the 7 megawatts of power currently being drawn from the grid can only support approximately 4,000 GPUs. The Tennessee Valley Authority (TVA) plans to supply 50 megawatts of power to the facility by the end of August, but this will require a deal with Musk’s AI company, xAI.

Patel’s analysis of satellite imagery revealed that Musk has implemented 14 VoltaGrid mobile generators, each providing 2.5 megawatts of power, totaling 35 megawatts. With an additional 8 megawatts from the grid, the total capacity is 43 megawatts, sufficient to power 32,000 H100 GPUs with limited power. If the TVA provides the required 50 megawatts by early August, Musk will be able to run 64,000 GPUs simultaneously.

Nevertheless, the colossal power consumption and its impact on global warming pose significant challenges to AI data centers. In 2023, the power consumption of all the sold data center GPUs alone exceeds the total power consumption of 1.3 million average American households, placing immense pressure on the grid. In addition to building more power plants, the development of high-voltage transmission lines, substations, and other infrastructure is necessary to transport power from power plants to servers. Besides the time and cost involved in building power plants for AI computing, greenhouse gas emissions must also be considered.

While Musk’s Memphis Supercluster deploys mobile generators using natural gas, which is cleaner than coal or oil, emissions of carbon are still released into the atmosphere during operation. Google recently disclosed that its carbon footprint has grown by 48% since 2019 due to the energy demand of data centers. Therefore, unless Musk transitions to cleaner energy production methods, xAI will also face the same issue.

Musk is pushing xAI to become a leader in AI development, resorting to all means to meet the data center’s power demands. The use of mobile generators is merely a temporary solution, and the Memphis supercomputer cluster needs to transition to cleaner energy sources. The Tennessee Valley Authority can provide such energy, unlike the mobile generators that solely use natural gas. Since the TVA uses a combination of nuclear, hydro, and fossil fuels for electricity generation, xAI’s carbon footprint would be smaller if it procures power from the TVA rather than relying on mobile generators.

In summary, while Musk’s AI data center strives for powerful computing capabilities, it also confronts the challenge of addressing environmental and energy issues. This requires not only technological innovation but also policy support and collaboration to promote the sustainable development of AI industries.

【来源】https://www.ithome.com/0/783/991.htm

Views: 0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注