OpenNLPLab团队最近发布了一种新型的线性注意力机制Lightning Attention-2,旨在解决大语言模型在长序列场景下的效率问题。该机制让模型在序列长度达到1K tokens时,训练和推理的计算成本仍能保持一致。研究人员表示,在显存瓶颈出现之前,序列长度的无限增长不会对模型训练速度产生明显负面影响,这使得无限长度的预训练成为了可能。同时,超长文本的推理成本也与1K tokens的文本近似甚至更低,大大减少了当前大语言模型的推理成本。该研究成果发表在著名的人工智能领域学术会议上,受到业内专家的广泛关注。
Title: OpenNLPLab Releases Lightning Attention-2, A New Generation Attention Mechanism
Keywords: attention mechanism, long sequence, training inference
News content: The OpenNLPLab team recently released a new type of linear attention mechanism called Lightning Attention-2, aiming to solve the efficiency issues of large language models in long sequence scenarios. The mechanism enables models to keep training and inference costs consistent when the sequence length reaches 1k tokens. Researchers said that before memory bottlenecks occur, unlimited sequence length growth will not have significant negative impacts on model training speed, making infinite-length pretraining possible. At the same time, the inference cost of super long texts is approximate or even lower than that of 1k token texts, greatly reducing the inference cost of current large language models. The research achievement was published at a renowned academic conference in the AI field and has attracted wide attention from experts in the industry.
【来源】https://www.jiqizhixin.com/articles/2024-01-18-5
Views: 1