Yuanxiang Releases China’s Largest Open-Source MoE Model: XVERSE-MoE-A36B

Yuanxiang XVERSE has unveiledChina’s largest open-source Mixture of Experts (MoE) model, XVERSE-MoE-A36B, boasting a massive 255 billion parameters and 36 billion activated parameters. This groundbreaking model achieves a leapfrog improvement over 100 billion parameter models, significantly exceedingtheir performance while reducing training time by 30% and boosting inference performance by 100%. This translates to a substantial decrease in per-token cost.

Benchmarking against leading models, Yuanxiang’s MoEdemonstrates superior performance across multiple authoritative evaluations. It outperforms prominent models such as the domestic Skywork-MoE (100 billion parameters), the traditional MoE champion Mixtral-8x22B, and the 314billion parameter open-source model Grok-1-A86B.

MoE: A Breakthrough in Model Architecture

MoE, a cutting-edge model architecture, combines multiple specialized expert models into a single supermodel. This approach breaks the limitations of traditional scaling laws, allowing for significant model expansionwithout dramatically increasing training and inference computational costs, thus maximizing model performance.

This groundbreaking technology has been adopted by leading models like Google’s Gemini-1.5, OpenAI’s GPT-4, and xAI’s Grok, highlighting its transformative potential.

Yuanxiang’s Commitment to Openness and Accessibility

Notably, Yuanxiang’s high-performance toolkit series is entirely open-source and available for free commercial use. This empowers a vast array of small and medium-sized enterprises, researchers, and developers to leverage these powerful models according to their specific needs.

Building onPrevious Success

In April, Yuanxiang launched XVERSE-MoE-A4.2B. Unlike traditional MoE models (e.g., Mixtral 8x7B) where each expert’s size is equivalent to a standard feedforward network (FFN), Yuanxiang employs a more granular expert design,with each expert being only a quarter the size of a standard FFN. This enhances model flexibility and performance.

Furthermore, Yuanxiang categorizes experts into two types: shared experts and non-shared experts. Shared experts remain active throughout the computation process, while non-shared experts are selectively activated based on need. This designeffectively compresses general knowledge into shared expert parameters, minimizing knowledge redundancy among non-shared expert parameters.

The release of XVERSE-MoE-A36B marks a significant milestone in Yuanxiang’s commitment to pushing the boundaries of AI. This powerful model, coupled with its open-source nature,promises to democratize access to advanced AI technology and accelerate innovation across diverse industries.


>>> Read more <<<

Views: 0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注