混合专家技术：提升LLM效率的革命性方法

随着人工智能技术的不断发展，大型语言模型（LLM）在各个领域的应用越来越广泛。然而，这些模型的计算需求巨大，如何在不增加太多计算成本的情况下提升其性能，成为了一个亟待解决的问题。混合专家（MoE）作为一种提升大型语言模型效率的方法，正逐渐受到业界的关注。

混合专家（MoE）的原理在于将模型分为多个专家网络，每个专家专注于处理特定的任务或数据类型。这样，对于每个输入，只有与之相关的专家才会参与处理，从而控制了计算成本，同时又能够利用大量的专业知识。这种方法最早可以追溯到1991年的论文，经过三十多年的发展，已经得到了广泛的探索和应用。

近年来，随着稀疏门控MoE的提出和发展，尤其是与基于Transformer的大型语言模型的结合，这种技术再次焕发了新的活力。稀疏门控MoE通过计算top-k个专家的输出加权和，而非将所有专家的输出聚合到一起，实现了计算的稀疏性。

目前，多家科技公司正在使用混合专家（MoE）方法来开发新一代的大模型。例如，Mixtral-8x7B、Grok-1、DBRX、Arctic、DeepSeek-V2等产业级LLM都在使用这种技术。香港科技大学（广州）的研究团队近日发布了一篇MoE的综述报告，全面总结了相关研究，并提出了新的分类法。

总的来说，混合专家（MoE）为大型语言模型的可持续扩展提供了一条有效途径。通过这种技术，可以在不大幅提升计算需求的前提下，提升大语言模型的能力，从而推动人工智能技术的进一步发展。

英语如下：

News Title: “Hybrid Expert Technique: A Revolutionary Method to Boost Efficiency of Large Language Models”

Keywords: Hybrid Experts, MoE, Language Models

News Content:
As artificial intelligence technology continues to evolve, the applications of large language models (LLMs) have become increasingly widespread across various fields. However, the substantial computational demands of these models pose a challenge: how to enhance their performance without significantly increasing computational costs. The hybrid expert (MoE) approach, which aims to improve the efficiency of large language models, is gradually gaining attention in the industry.

The principle behind hybrid experts (MoE) is to divide the model into multiple expert networks, with each expert focusing on handling specific tasks or data types. This way, for each input, only the relevant expert(s) participate in the processing, thereby controlling computational costs while still leveraging a wealth of expertise. The concept of MoE can be traced back to a 1991 paper, and after more than three decades of development, it has been extensively explored and applied.

In recent years, the introduction and development of sparse gating MoE, particularly in conjunction with large language models based on the Transformer architecture, have brought new vitality to this technology. Sparse gating MoE achieves computational sparsity by computing a weighted sum of the top-k experts’ outputs, rather than aggregating the outputs of all experts.

Currently, several tech companies are using the hybrid expert (MoE) method to develop next-generation large models. Industrial-grade LLMs such as Mixtral-8x7B, Grok-1, DBRX, Arctic, and DeepSeek-V2 are employing this technology. A review report on MoE was recently released by the research team at the Hong Kong University of Science and Technology (Guangzhou), providing a comprehensive summary of related research and proposing a new classification method.

In summary, the hybrid expert (MoE) approach offers an effective path for the sustainable expansion of large language models. Through this technology, it is possible to enhance the capabilities of large language models without a substantial increase in computational requirements, thereby driving further development in artificial intelligence technology.

【来源】https://www.jiqizhixin.com/articles/2024-07-26-5