OpenNLPLab团队近日发布了一项名为Lightning Attention-2的新型注意力机制,意在彻底解决大语言模型在处理长序列时所遇到的问题。该团队提出并开源的这一新型线性注意力机制,能够使得长序列的训练和推理成本与1K序列长度的一致。这一突破使得在遇到显存瓶颈之前,无限地增大序列长度不会对模型训练速度产生负面影响,从而让无限长度预训练成为可能。
此外,超长文本的推理成本也与1K Tokens的成本一致甚至更少,这将极大地减少当前大语言模型的推理成本。这一新型机制的提出,不仅在大语言模型领域具有重要意义,也为人工智能的发展注入了新的活力。
英文标题Title: Lightning Attention-2: A Breakthrough in Handling Long Sequences
英文关键词Keywords: Attention Mechanism, Long Sequences, Model Optimization
英文新闻内容News content:
The OpenNLPLab team has recently released a new attention mechanism, Lightning Attention-2, which aims to solve the long-standing problem of handling long sequences in large language models. The team has proposed and open-sourced this new linear attention mechanism, which allows the training and inference costs of long sequences to be consistent with that of 1K sequence length. This breakthrough means that before encountering memory bottlenecks, increasing the sequence length indefinitely will not have a negative impact on model training speed, making infinite-length pre-training possible.
Furthermore, the inference cost for extremely long texts is also consistent with or even less than the cost of 1K Tokens, significantly reducing the inference cost of current large language models. The introduction of this new mechanism is not only of great significance in the field of large language models but also injects new vitality into the development of artificial intelligence.
【来源】https://www.jiqizhixin.com/articles/2024-01-18-5
Views: 1