##腾讯混元大模型负责人王迪:揭秘万亿 MoE系统工程之道
**机器之心** 8月21日报道 -近日,机器之心《智者访谈》栏目邀请到腾讯机器学习平台部总经理/混元大模型负责人王迪先生,深入探讨了腾讯从零到一自研万亿级 MoE 大模型的历程。王迪强调,大模型是一项跨领域的系统工程,需要在约束下高效整合工程、算法、数据和业务应用,对组织能力提出了前所未有的挑战。
**小模型成趋势的深层逻辑**
近期,OpenAI 发布了 GPT-4o mini,引发了业界对小模型的关注。王迪认为,GPT-4o mini 的出现并非 OpenAI 突然转向小模型,而是出于对应用需求的考量。许多场景并不需要像 GPT-4Turbo 那样庞大且延时高的模型,GPT-4o mini 的速度优势能够满足这些需求。
王迪进一步指出,真正挑战在于如何在保证效果的前提下,将模型缩小。单纯做一个小模型很容易,但要确保其效果与大模型相当,则需要在数据和模型架构上进行大量优化。
**腾讯为何选择从零自研大模型**
王迪表示,腾讯选择从零自研大模型,主要出于以下考虑:
* **技术自主可控:** 自研能够更好地掌握核心技术,避免受制于人。
* **数据安全:** 自研能够更好地保护数据安全,避免数据泄露风险。
* **定制化需求:** 自研能够更好地满足腾讯自身业务需求,例如在游戏、社交、金融等领域进行应用。
**MoE Scaling Law:腾讯的着眼点**
腾讯在自研大模型过程中,重点关注了 MoE Scaling Law。MoE(Mixture-of-Experts)是一种模型架构,它能够有效地提升模型容量,并降低训练成本。腾讯在 MoE Scaling Law 上进行了深入研究,并将其应用于混元大模型的研发。
**布局全模态:统一到 Transformer**
王迪认为,未来大模型的发展趋势是向全模态方向发展。腾讯也正在积极布局全模态大模型,并将其统一到 Transformer 架构下。
**平台层如何衔接上层应用与下层算力**
王迪强调,大模型平台需要能够有效地衔接上层应用和下层算力。腾讯在平台层进行了大量的优化,例如:
* **训练推理框架优化:** 提升训练和推理效率。
* **压缩与加速方法探索:** 降低模型体积和延时。
**技术路径选择:直觉从何而来?**
王迪认为,技术路径的选择需要基于直觉和经验。腾讯在研发过程中,积累了大量的经验,并通过不断尝试和验证,最终找到了最佳的解决方案。
**万亿 MoE 实践:稳定性、鲁棒性**
王迪指出,万亿级 MoE 大模型的训练和部署面临着巨大的挑战,例如:
* **模型稳定性:** 确保模型在训练过程中保持稳定。
* **模型鲁棒性:** 确保模型能够抵御噪声和攻击。
**算力集群发展及 AI Infra 展望**
王迪表示,算力集群是 AI 基础设施的重要组成部分。腾讯正在积极投入算力集群的建设,并不断探索 AI Infra 的发展方向。
**总结**
腾讯混元大模型的研发,展现了大模型研发和工程的整个链路,从基础设施的构建到训练推理框架的优化,再到业务场景的落地,为理解大模型提供了独特的视角。王迪的分享,也为我们揭示了大模型研发背后的挑战和机遇,以及未来 AI 产业发展的趋势。
英语如下:
##Tencent’s Hunyuan Large Model Lead Wang Di: Unveiling the Secrets of a Trillion-Parameter MoE System Engineering
**Keywords:** Hunyuan, MoE, Engineering
**Machine Intelligence** August 21st – Recently, Machine Intelligence’s “Wise Interview” series invitedWang Di, General Manager of Tencent’s Machine Learning Platform Department and Head of the Hunyuan Large Model, for an in-depth discussion on Tencent’s journey of building a trillion-parameter MoE large model from scratch. Wang Di emphasized that large models are a cross-domain system engineering endeavor, requiring efficient integration of engineering, algorithms, data, and business applications under constraints, posing unprecedentedchallenges to organizational capabilities.
**The Deep Logic Behind the Trend of Smaller Models**
OpenAI’s recent release of GPT-4o mini has sparked industry attention on smaller models. Wang Di believes that GPT-4o mini’s emergence is not a sudden shift towards smaller models by OpenAI, but rather a response to application needs. Many scenarios do not require a model as massive and latency-heavy as GPT-4Turbo, and GPT-4o mini’s speed advantage can meet these needs.
Wang Di further pointed out that thereal challenge lies in shrinking the model while maintaining performance. Creating a small model is easy, but ensuring its performance matches that of a large model requires extensive optimization in data and model architecture.
**Why Tencent Chose to Develop a Large Model From Scratch**
Wang Di stated that Tencent chose to develop a large model fromscratch due to the following considerations:
* **Technological Autonomy and Control:** Self-development allows for better control over core technologies, avoiding dependence on others.
* **Data Security:** Self-development ensures better protection of data security, mitigating data leakage risks.
* **Customized Needs:** Self-development allows for betterfulfillment of Tencent’s own business needs, such as applications in gaming, social media, finance, and other areas.
**MoE Scaling Law: Tencent’s Focus**
In developing its large model, Tencent has focused on the MoE Scaling Law. MoE (Mixture-of-Experts) is amodel architecture that effectively increases model capacity and reduces training costs. Tencent has conducted in-depth research on the MoE Scaling Law and applied it to the development of the Hunyuan large model.
**Laying the Foundation for Multimodal Models: Unification under Transformer**
Wang Di believes that the future trend oflarge model development is towards multimodality. Tencent is actively deploying multimodal large models and unifying them under the Transformer architecture.
**How the Platform Layer Connects Upper-Layer Applications and Lower-Layer Computing Power**
Wang Di emphasized that large model platforms need to effectively connect upper-layer applications and lower-layer computingpower. Tencent has implemented significant optimizations at the platform level, including:
* **Training and Inference Framework Optimization:** Enhancing training and inference efficiency.
* **Exploration of Compression and Acceleration Methods:** Reducing model size and latency.
**Technology Path Selection: Where Does Intuition Come From?**
Wang Di believes that technologypath selection requires intuition and experience. During development, Tencent has accumulated extensive experience and through continuous experimentation and validation, has ultimately found the optimal solutions.
**Trillion-Parameter MoE Practice: Stability and Robustness**
Wang Di pointed out that the training and deployment of trillion-parameter MoE large models face significantchallenges, such as:
* **Model Stability:** Ensuring model stability during training.
* **Model Robustness:** Ensuring model resilience against noise and attacks.
**Computing Cluster Development and AI Infrastructure Outlook**
Wang Di stated that computing clusters are a crucial component of AI infrastructure. Tencent is actively investing in the constructionof computing clusters and continuously exploring the development direction of AI infrastructure.
**Conclusion**
The development of Tencent’s Hunyuan large model showcases the entire chain of large model development and engineering, from building infrastructure and optimizing training and inference frameworks to real-world application scenarios, providing a unique perspective on understanding large models. Wang Di’s sharing reveals the challenges and opportunities behind large model development, as well as the trends in the future development of the AI industry.
【来源】https://www.jiqizhixin.com/articles/2024-08-21-7
Views: 1