CPU逆袭NPU：llama.cpp生成速度狂飙5倍

作者智能小编

8 月 14, 2024 #36氪, #CPU, #LLM, #NPU

在人工智能领域，计算能力的提升一直是一个备受关注的话题。近日，一项新的研究显示，传统的CPU（中央处理单元）在处理自然语言处理任务时，其性能已经超越了专用的AI加速器NPU（神经处理单元）。这项研究由来自多个研究机构的科学家们共同完成，他们发现通过优化算法和硬件配置，CPU在处理大型语言模型（LLM）时能够展现出更高的效率。

具体来说，研究人员使用了一个名为llama.cpp的开源代码库，这个库用于生成文本，其运行速度在CPU上的提升高达5倍。这表明，在某些情况下，CPU可能成为处理复杂自然语言任务的首选硬件平台。

此外，这项研究还推动了LLM端侧部署的新范式——T-MAC的开源。T-MAC是一种新的软件架构，它允许LLM在移动设备和边缘计算设备上直接运行，而不需要将数据传输到云端服务器。这种端侧部署的方式能够显著减少延迟，提高响应速度，同时减少对网络依赖，对于需要实时处理大量自然语言数据的应用场景来说，具有重要意义。

这一系列进展不仅提升了计算效率，还可能改变人工智能在各个行业中的应用方式。随着技术的不断进步，未来的AI应用可能会更加注重在本地设备上的处理能力，从而提供更加快速和可靠的服务。

英语如下：

News Title: “CPU Overtakes NPU: llama.cpp Generation Speed Soars 5X”

Keywords: CPU, NPU, LLM

News Content: In the field of artificial intelligence, the enhancement of computational capabilities has always been a topic of significant interest. Recently, a new study has shown that traditional CPUs (Central Processing Units) have surpassed dedicated AI accelerators, NPUs (Neural Processing Units), in performance when handling natural language processing tasks. This research was conducted by scientists from various research institutions, who found that through algorithmic optimization and hardware configuration, CPUs can exhibit higher efficiency in processing large language models (LLMs).

Specifically, researchers used an open-source code library named llama.cpp, which is used for text generation, and observed a 5X speedup in its performance on CPUs. This indicates that in certain scenarios, CPUs may become the preferred hardware platform for handling complex natural language tasks.

Furthermore, this research has also pushed forward a new paradigm for LLM edge deployment—the open-source T-MAC. T-MAC is a new software architecture that allows LLMs to run directly on mobile devices and edge computing equipment, without the need to transfer data to cloud servers. This edge deployment approach can significantly reduce latency and enhance response speed, while also decreasing dependence on the network, which is of great significance for applications that require real-time processing of large volumes of natural language data.

These series of advancements not only enhance computational efficiency but also may change the ways AI is applied in various industries. As technology continues to advance, future AI applications may place more emphasis on processing capabilities on local devices, thereby providing faster and more reliable services.

【来源】https://36kr.com/p/2904311413643905