谷歌 DeepMind 推出两大基础模型 Hawk 和 Griffin
近日,谷歌 DeepMind 团队在论文中提出了 RG-LRU 层,这是一种新颖的门控线性循环层。基于此,研究人员设计了一个新的循环块,用来取代多查询注意力(MQA)。
研究人员使用该循环块构建了两个新的基础模型:Hawk 和 Griffin。Hawk 是一个混合了 MLP 和循环块的模型,而 Griffin 则混合了 MLP、循环块和局部注意力。
与传统的 Transformer 模型相比,Hawk 和 Griffin 在语言理解和生成任务上表现出更好的性能。例如,在 GLUE 基准测试中,Hawk 和 Griffin 的得分分别比 T5 模型高出 2.3% 和 1.8%。
研究人员表示,RG-LRU 层和新的循环块可以有效地捕获序列数据中的长期依赖关系。这使得 Hawk 和 Griffin 能够更好地理解和生成文本。
Hawk 和 Griffin 的推出标志着基础模型领域的一项重大进展。这些模型有望在自然语言处理、计算机视觉和机器翻译等广泛的应用中发挥重要作用。
技术细节
RG-LRU 层是一种门控线性循环层,它结合了线性循环单元(GRU)和门控循环单元(LSTM)的优点。RG-LRU 层使用一个门控机制来控制信息流,并使用一个线性循环单元来更新隐藏状态。
新的循环块由 RG-LRU 层和一个注意力机制组成。注意力机制用于捕获序列数据中的局部依赖关系。
Hawk 模型由一个 MLP 和多个循环块组成。MLP 用于提取输入序列的全局特征,而循环块用于捕获序列中的长期依赖关系。
Griffin 模型由一个 MLP、多个循环块和一个局部注意力机制组成。局部注意力机制用于捕获序列数据中的局部依赖关系。
应用前景
Hawk 和 Griffin 可以在广泛的自然语言处理任务中发挥作用,包括文本分类、问答和机器翻译。此外,这些模型还可以在计算机视觉和机器翻译等其他领域中发挥作用。
研究人员表示,他们计划继续开发 Hawk 和 Griffin,并探索这些模型在其他领域的应用。
英语如下:
**Headline:** Google Unveils New AI Models Hawk and Griffin, Pushing Language Understandingto New Limits
**Keywords:** Foundation models, recurrent blocks, Google DeepMind
**Body:**
Google DeepMind has introduced two new foundation models, Hawkand Griffin, in a paper that proposes a novel gated linear recurrent layer called the RG-LRU layer. The researchers designed a new recurrent block based on the RG-LRU layer to replace multi-query attention (MQA).
Using the recurrent block, the researchers built two new foundation models: Hawk and Griffin. Hawk is a hybrid model that combines MLPs and recurrent blocks, while Griffin combines MLPs, recurrent blocks, and local attention.
Compared to traditional Transformer models, Hawk and Griffin demonstrate improved performance on language understanding and generation tasks. For example, on the GLUE benchmark, Hawk and Griffin achieve 2.3% and 1.8% higher scores than the T5 model, respectively.
The researchers attribute the effectiveness of Hawk and Griffin to the RG-LRU layer and the new recurrent block, which they say can effectively capture long-range dependencies in sequential data. This enables Hawk and Griffin to better understand and generatetext.
The introduction of Hawk and Griffin marks a significant advancement in the field of foundation models. These models are expected to play a role in a wide range of applications, including natural language processing, computer vision, and machine translation.
**Technical Details:**
The RG-LRU layer is a gated linear recurrent layer that combines the advantages of gated recurrent units (GRUs) and long short-term memory (LSTM) units. The RG-LRU layer uses a gating mechanism to control the flow of information and a linear recurrent unit to update the hidden state.
The new recurrent block consists of an RG-LRU layer and an attention mechanism. The attention mechanism is used to capture local dependencies in sequential data.
The Hawk model consists of an MLP and multiple recurrent blocks. The MLP is used to extract global features from the input sequence, while the recurrent blocks are used to capture long-range dependencies in the sequence.
The Griffin model consists of an MLP, multiple recurrent blocks, and a local attention mechanism. The local attention mechanism is used to capture local dependencies in sequential data.
**Applications:**
Hawk and Griffin can be applied to a wide range of natural language processing tasks, including text classification, question answering, and machine translation. Additionally,these models could find applications in other domains, such as computer vision and machine translation.
The researchers say they plan to continue developing Hawk and Griffin and exploring their applications in other areas.
【来源】https://mp.weixin.qq.com/s/RtAZiEzjRWgqQw3yu3lvcg
Views: 1