新一代注意力机制突破长序列难题

作者智能小编

2 月 22, 2024 #模型优化, #每日AI快讯, #注意力机制, #长序列

OpenNLPLab团队近日发布了一项名为Lightning Attention-2的新型注意力机制，旨在一劳永逸地解决大语言模型在处理长序列问题时所遇到的挑战。这一突破性的技术让长序列的训练和推理成本与1K序列长度的一致，使得无限长度预训练成为可能。

据悉，Lightning Attention-2是一种新型的线性注意力机制，它能够在不遇到显存瓶颈的情况下，处理无限长度的序列。这意味着，在模型训练过程中，可以不再受限于序列的长度，从而提高模型的训练效率和效果。

此外，该机制在处理超长文本的推理时，其成本甚至低于1K Tokens，这将极大地降低当前大语言模型的推理成本，为自然语言处理领域带来了一场革命性的变革。

英文标题Title：New Attention Mechanism Breaks Long Sequence Challenges
英文关键词Keywords：Attention Mechanism, Long Sequence, Model Optimization

英文新闻内容News content：
The OpenNLPLab team has recently released a new attention mechanism called Lightning Attention-2, which aims to solve the long-standing problem of handling long sequences in large language models once and for all. This groundbreaking technology makes the training and inference costs of long sequences consistent with those of 1K sequence lengths, making infinite-length pre-training a possibility.

It is understood that Lightning Attention-2 is a new linear attention mechanism that can handle infinite-length sequences without encountering memory bottlenecks. This means that during model training, there is no longer a limitation on sequence length, thereby improving the efficiency and effectiveness of model training.

In addition, when dealing with ultra-long text inference, the cost of this mechanism is even lower than that of 1K Tokens, which will greatly reduce the inference cost of current large language models, bringing a revolutionary change to the field of natural language processing.

【来源】https://www.jiqizhixin.com/articles/2024-01-18-5