Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

0

上海,2025年3月5日 – 在人工智能领域,模型架构的创新一直是推动AGI(通用人工智能)发展的关键引擎。上海AI Lab今日宣布推出一项名为“Mixture-of-Memories (MoM)”的全新技术,旨在为线性注意力模型注入“稀疏记忆”能力,从而突破现有线性序列建模方法的性能瓶颈。这一突破性进展有望为长序列建模带来革命性的变革。

回顾AGI的发展历程,从最初的预训练模型和数据规模扩展,到后来的指令微调(SFT)和强化学习人类反馈(RLHF),再到推理能力的提升,找到正确的scaling维度始终是核心挑战。Transformer架构自2017年问世以来,凭借其强大的“无损记忆”能力,在自然语言处理领域占据主导地位。然而,Transformer架构也面临着巨大的KV缓存代价,限制了其在长序列建模中的应用。

近年来,线性序列建模方法,如Mamba系列和RWKV系列,通过维护固定大小的RNN memory state,降低了计算复杂度。但这些方法往往面临较低的性能上限,难以在需要长期记忆的任务中取得理想效果。

上海AI Lab的研究人员认为,未来的模型架构需要同时具备强大的memory scaling能力和关于序列长度的低复杂度。高效注意力机制,如线性或稀疏注意力,是实现长序列建模的必要条件。而memory scaling能力则是一个有待探索的重要课题。

MoM的出现,正是为了解决这一难题。MoM的核心思想是借鉴MoE(Mixture of Experts)的思想,通过router将token分发到多个KV memory中,从而实现memory维度的scaling。每个memory都可以进行RNN-style计算,使得整体具有关于序列长度线性的训练复杂度,推理复杂度更是达到了常数级别。

MoM还引入了shared memory和local memory机制,分别处理全局和局部信息,进一步提升了模型的性能。实验结果表明,MoM在recall-intensive任务上表现出色,甚至在13亿参数的模型上已经可以与Transformer架构相媲美。

“我们希望MoM能够打破目前线性序列建模方法在gate和RNN更新规则上的固有模式,实现memory大小的稀疏且无限制的扩展。”上海AI Lab的研究团队表示。

该研究的论文已发表在arXiv上,代码已开源在GitHub上,模型权重也已上传至Hugging Face Hub。

MoM的推出,标志着线性注意力模型在长序列建模领域迈出了重要一步。未来,上海AI Lab将继续致力于推动AI技术的创新,为AGI的发展贡献力量。

参考文献:

  • 上海AI Lab. (2025). Mixture-of-Memories: Scaling Linear Attention with Sparse Memory. arXiv preprint arXiv:2502.13685.

关键词: 上海AI Lab, Mixture-of-Memories, 线性注意力, 稀疏记忆, 长序列建模, AGI, Transformer, MoE, RNN, 机器学习, 人工智能.


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注