Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

0

[Beijing, China] – iFLYTEK, in collaboration with Huawei, has announced a significant advancement in the field of domestic computing power, achieving large-scale cross-node expert parallel cluster inference of Mixture of Experts (MoE) models on a domestic computing power cluster. This breakthrough marks a crucial step towards independent and controllable AI infrastructure in China.

The joint team from iFLYTEK and Huawei achieved this milestone through deep software and hardware co-innovation. They successfully validated and deployed the solution on the Ascend cluster, resulting in significant performance improvements. This achievement positions them as the first in the industry to offer a complete solution based on domestic computing power, following DeepSeek’s announcement of its MoE model training and inference plan.

Key Technical Innovations:

  • Operator Fusion: The team implemented parallel processing of Vector and Cube heterogeneous computing units during the MLA preprocessing stage. By fusing multiple small operators into atomic-level computing units, they eliminated the overhead associated with issuing small operators, reducing MLA preprocessing latency by over 50%.
  • Hybrid Parallel Strategy and Communication Optimization: A hybrid TP (Tensor Parallelism) + EP (Expert Parallelism) paradigm was constructed. TP parallelism was employed within the machine for the MLA computing layer to leverage high-speed interconnect advantages and reduce cross-machine communication losses. Innovative MoE expert layered scheduling was implemented, evenly distributing expert computing nodes across 64 cards. A customized AllToAll communication protocol was developed, improving expert data exchange efficiency by 40%. A dual-layer communication architecture (cross-machine/intra-machine) was built, reducing cross-machine traffic by 60% through layered optimization.
  • Load Balancing: A routing expert load balancing algorithm was developed, achieving a load difference of less than 10% between cards, increasing cluster throughput by 30%.

Performance Gains:

These innovations resulted in significant performance improvements on domestic computing power:

  • Single-card static memory usage was reduced to 1/4 of the dual-machine deployment, improving efficiency by 75%.
  • Expert computing density increased by 4 times.
  • Inference throughput increased by 3.2 times.
  • End-to-end latency decreased by 50%.

Impact and Applications:

This breakthrough solution will be applied to accelerate the training of iFLYTEK Spark deep inference models, with an expected 200% increase in training inference efficiency. Furthermore, the inference engine based on this solution enables efficient inference of DeepSeek V3 and R1 on domestic computing power.

iFLYTEK recently upgraded its Spark X1 deep inference model, achieving leading performance in various Chinese mathematical tasks, fully benchmarking DeepSeek R1 and OpenAI o1, despite having one order of magnitude fewer model parameters than its industry peers. iFLYTEK emphasizes its commitment to the domestic ecosystem, highlighting that Spark X1 is currently the only deep inference model trained using all-domestic computing power.

iFLYTEK plans to continuously iterate and upgrade the Spark large model and accelerate the SparkAPI for developers through the inference engine. This includes the Spark large model series API and related model APIs open-sourced on the Star MaaS platform.

References:

  • iFLYTEK RESEARCH. (2024). 科大讯飞联合华为率先实现国产算力大规模跨节点专家并行集群推理 [iFLYTEK and Huawei Achieve Breakthrough in Domestic Computing Power with Large-Scale Parallel Inference]. Retrieved from [Insert original source URL here if available]


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注