阿里通义千问团队近日宣布推出了Qwen系列的首款MoE(Mixture of Experts)模型——Qwen1.5-MoE-A2.7B,标志着在大规模预训练模型领域的又一重要突破。这款模型尽管只有27亿个激活参数,却展现出与当前顶级的70亿参数模型,如Mistral 7B和Qwen1.5-7B相匹敌的性能,彰显了其高效能的特性。

Qwen1.5-MoE-A2.7B的创新之处在于其Non-Embedding参数的数量显著减少,仅为20亿个,相比Qwen1.5-7B的65亿个Non-Embedding参数,减小了约三分之二,实现了模型规模的大幅压缩。与此同时,这一优化并未牺牲性能,反而在训练成本上实现了75%的显著降低,这对于大规模模型的训练和维护来说是一次重大优化。

此外,Qwen1.5-MoE-A2.7B在推理速度方面也展现出卓越的提升,速度提升了1.74倍,这意味着在实际应用中,该模型能够更快地响应用户需求,提供更高效的服务。这一进步对于提升用户体验和推动人工智能技术在各领域的应用具有重大意义。

这一成果的发布在魔搭社区引起了广泛关注,再次证明了阿里通义千问团队在模型压缩与性能优化方面的技术实力,也为未来更高效、更经济的预训练模型研发奠定了坚实基础。

英语如下:

**News Title:** “The Qwen1.5-MoE-A2.7B Team at Alibaba Turing Qwen Releases a High-Performance Large Model: Half the Parameters, Double the Speed!”

**Keywords:** Qwen1.5-MoE-A2.7B, Performance Boost, Cost Reduction

**News Content:** The Alibaba Turing Qwen team recently announced the launch of Qwen1.5-MoE-A2.7B, the first MoE (Mixture of Experts) model in the Qwen series, marking another significant breakthrough in the field of large-scale pre-training models. Despite having only 2.7 billion active parameters, this model demonstrates performance comparable to top-tier models with 7 billion parameters, such as Mistral 7B and Qwen1.5-7B, highlighting its high efficiency.

The innovation of Qwen1.5-MoE-A2.7B lies in its significantly reduced number of Non-Embedding parameters, totaling 2 billion, down from 6.5 billion in Qwen1.5-7B, representing a reduction of approximately two-thirds. This reduction achieves a substantial compression of the model size without compromising performance. Instead, it leads to a remarkable 75% decrease in training cost, which constitutes a major optimization for large-scale model training and maintenance.

Furthermore, Qwen1.5-MoE-A2.7B exhibits an outstanding improvement in inference speed, with a 1.74 times increase. This translates to faster response times to user needs in practical applications, providing more efficient services. This advancement holds significant implications for enhancing user experience and propelling the application of artificial intelligence technology across various sectors.

The release of this achievement has attracted widespread attention in the ModelScope community, reaffirming the Qwen team’s technical prowess in model compression and performance optimization. It also lays a solid foundation for the development of more efficient and cost-effective pre-training models in the future.

【来源】https://mp.weixin.qq.com/s/6jd0t9zH-OGHE9N7sut1rg

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注