DeepSeek开源236B参数MoE模型：性能逼近GPT-4-Turbo

近日，探索通用人工智能（AGI）本质的DeepSeek AI公司开源了一款强大的混合专家（MoE）语言模型——DeepSeek-V2。该模型以其训练成本更低、推理更加高效的特点备受瞩目。

DeepSeek-V2的参数量达到了236B，每个token激活21B参数，并支持128K token的上下文长度。在性能方面，DeepSeek-V2的表现非常亮眼。在AlignBench基准上，其表现超过了GPT-4，并接近GPT-4-Turbo。在MT-Bench中，DeepSeek-V2与LLaMA3-70B相媲美，并且优于Mixtral 8x22B。

值得一提的是，DeepSeek-V2在数学、代码和推理方面的能力尤为突出。这一性能表现使其在众多语言模型中脱颖而出，引发了业界对于通用人工智能的进一步思考和探索。

此次DeepSeek-V2的开源，不仅展示了DeepSeek AI公司在混合专家模型领域的技术实力，也为学术界和产业界提供了更多的研究和应用可能性。这款模型的性能接近GPT-4-Turbo，但训练成本更低、推理更加高效，有望在未来的通用人工智能研究中发挥重要作用。

未来，随着DeepSeek-V2的开源社区的发展和壮大，我们期待看到更多的创新和突破。这款模型的性能接近GPT-4-Turbo，但训练成本更低、推理更加高效，有望在未来的通用人工智能研究中发挥重要作用。

此次DeepSeek-V2的开源，不仅展示了DeepSeek AI公司在混合专家模型领域的技术实力，也为学术界和产业界提供了更多的研究和应用可能性。我们期待看到更多的创新和突破，共同推动通用人工智能的发展。

#### 关于DeepSeek AI公司：

DeepSeek AI是一家专注于探索通用人工智能（AGI）的公司，致力于通过先进的机器学习技术推动人工智能的发展和应用。公司拥有一支经验丰富的团队，成员曾供职于新华社、人民日报、中央电视台、华尔街日报、纽约时报等知名媒体，具备深厚的新闻报道和编辑经验。

英语如下：

**News Title:** **DeepSeek Open Sources 236B Parameter MoE Model: Performance Approaches GPT-4-Turbo**

Keywords: DeepSeek MoE, powerful performance, open-source release.

**News Content:** ### DeepSeek Open Sources MoE Model with Performance Close to GPT-4-Turbo

Recently, DeepSeek AI, a company exploring the essence of Artificial General Intelligence (AGI), has open-sourced a powerful Mixed Expert (MoE) language model—DeepSeek-V2. The model stands out for its lower training cost and more efficient inference.

DeepSeek-V2 boasts a parameter scale of 236B, with each token activating 21B parameters, and supports a context length of 128K tokens. In terms of performance, DeepSeek-V2 has shown remarkable results. On the AlignBench benchmark, it outperforms GPT-4 and comes close to GPT-4-Turbo. On MT-Bench, DeepSeek-V2 rivals LLaMA3-70B and outperforms Mixtral 8x22B.

It is worth noting that DeepSeek-V2 particularly excels in mathematics, coding, and reasoning. This performance places it among the top language models, prompting further contemplation and exploration in the field of AGI.

The open-sourcing of DeepSeek-V2 not only showcases DeepSeek AI’s technical prowess in Mixed Expert models but also offers more possibilities for research and application to the academic and industrial communities. The model’s performance approaches that of GPT-4-Turbo while offering lower training costs and more efficient inference, promising to play a significant role in the future research of AGI.

In the future, as the open-source community around DeepSeek-V2 grows, we look forward to more innovations and breakthroughs. This model’s performance, approaching that of GPT-4-Turbo while being more cost-effective to train and more efficient in inference, is expected to be a significant contributor to the field of AGI research.

The open-sourcing of DeepSeek-V2 not only highlights DeepSeek AI’s expertise in Mixed Expert models but also provides the academic and industrial sectors with more opportunities for research and application. We anticipate more innovations and breakthroughs, jointly advancing the development of AGI.

#### About DeepSeek AI:

DeepSeek AI is a company focused on exploring Artificial General Intelligence (AGI) through advanced machine learning technology, committed to promoting the development and application of AI. The company boasts a team of experienced members, including former journalists from renowned media outlets such as Xinhua News Agency, People’s Daily, CCTV, Wall Street Journal, and New York Times, bringing deep news reporting and editing experience to the table.

【来源】https://www.jiqizhixin.com/articles/2024-05-07-3