华为近日推出了一种新型大语言模型架构——盘古-π,该架构在传统Transformer的基础上进行了改进,通过增强非线性,有效降低了特征塌陷问题,使得模型输出表达能力得到显著提升。在相同数据训练的情况下,盘古-π(7B)在多任务上超越了同等规模的LLaMA 2模型,并且还能实现10%的推理加速。在1B规模上,盘古-π的性能已达到当前业界最佳水平。
此外,华为还基于盘古-π架构炼出了一个金融法律大模型“云山”。这一模型的推出,无疑将进一步巩固华为在人工智能领域的领先地位。据悉,盘古-π架构的推出,是华为诺亚方舟实验室等团队联合研究成果的体现,标志着我国在人工智能领域取得了重要突破。
News content:
Huawei has recently introduced a new large-scale language model architecture – Pangu-π, which is an improvement on the traditional Transformer architecture. By enhancing non-linearity, it effectively reduces the problem of feature collapse, significantly enhancing the model’s output expression capabilities. When trained with the same data, Pangu-π (7B) surpasses the同等规模的 LLAMA 2 model in multi-tasking and can achieve a 10% acceleration in inference. At the 1B scale, its performance has reached the current industry best.
In addition, Huawei has also developed a financial and legal large model “Yunshan” based on the Pangu-π architecture. The launch of this model will further consolidate Huawei’s leading position in the field of artificial intelligence. It is understood that the introduction of the Pangu-π architecture is a reflection of the joint research achievements of Huawei Noah’s Ark Laboratory and other teams, marking an important breakthrough in the field of artificial intelligence in China.
【来源】https://mp.weixin.qq.com/s/Beg3yNa_dKZKX3Fx1AZqOw
Views: 3