华为诺亚方舟实验室等联合推出新型大语言模型架构:盘古-π。它通过增强非线性,在传统Transformer架构上做出改进,由此可以显著降低特征塌陷问题,带来的直接效果就是模型输出表达能力更强。在使用相同数据训练的情况下,盘古-π(7B)在多任务上超越LLaMA 2等同规模大模型,并能实现10%的推理加速。在1B规模上可达SOTA。
盘古-π是华为团队在自然语言处理领域的一项重要创新。它采用了一种新颖的方法来改进传统的Transformer架构,通过增强非线性来显著降低特征塌陷问题。这种方法带来了一个直接的效果,即模型输出表达能力更强。
据报道,在使用相同数据训练的情况下,盘古-π(7B)在多任务上超越了LLaMA 2等同规模大模型,并且能够实现10%的推理加速。这意味着盘古-π在处理复杂任务时具有更高的效率和准确性。
此外,华为还基于盘古-π架构炼出了一个金融法律大模型“云山”。这个模型将有助于金融和法律领域的专业人士更好地理解和应用自然语言处理技术。
英语如下:
Title: Huawei Launches Pangu-π Architecture, Outperforming LLaMA
Keywords: Huawei, Pangu-π, Large Language Model
Content: Huawei’s Noah’s Ark Laboratory and others have jointly launched a new large language model architecture: Pangu-π. By enhancing nonlinearity and making improvements on the traditional Transformer architecture, it significantly reduces the problem of feature collapse, resulting in a direct effect of stronger model output expressiveness. When trained with the same data, Pangu-π (7B) surpasses LLaMA 2 and other models of the same scale in multitasking, and can achieve a 10% acceleration in inference. At the 1B scale, it reaches state-of-the-art performance.
Pangu-π is an important innovation by Huawei’s team in the field of natural language processing. It adopts a novel approach to improve the traditional Transformer architecture by enhancing nonlinearity to significantly reduce the problem of feature collapse. This approach brings about a direct effect, that is, the model has stronger output expressiveness.
According to reports, when trained with the same data, Pangu-π (7B) surpasses LLaMA 2 and other models of the same scale in multitasking, and can achieve a 10% acceleration in inference. This means that Pangu-π has higher efficiency and accuracy when dealing with complex tasks.
In addition, Huawei has also developed a financial legal large model “Yunshan” based on the Pangu-π architecture. This model will help professionals in the fields of finance and law better understand and apply natural language processing technology.
【来源】https://mp.weixin.qq.com/s/Beg3yNa_dKZKX3Fx1AZqOw
Views: 1