全新TTT架构超越Transformer，大模型性能革命性突破

在人工智能领域的一次重大突破中，一种全新的语言模型架构——Test-Time Training (TTT) 正在挑战当前的霸主地位。这一架构由斯坦福大学、加州大学伯克利分校、加州大学圣迭戈分校和 Meta 等顶尖机构的科研人员共同研发，旨在彻底改变现有语言模型的构建方式。TTT 的出现，标志着从 125M 到 1.3B 的大模型性能提升的里程碑，引发业界广泛关注。

TTT 架构的核心创新在于用机器学习模型取代了传统循环神经网络 (RNN) 的隐藏状态，这一转变极大地优化了模型的计算效率和性能。通过输入 token 的实际梯度下降来压缩上下文，TTT 架构不仅实现了复杂性的降低，还保持了与现有领先模型如 Transformer 和 Mamba 相当甚至更优的性能。

论文的主要作者之一，Karan Dalal，对这一进展给予了高度评价，他坚信 TTT 架构将对语言模型的研究方法产生根本性的变革。在一系列参数规模从 125M 到 1.3B 的大模型对比实验中，TTT-Linear 和 TTT-MLP 模型不仅能够与当前最强大的 Transformer 和 Mamba 架构相匹敌，甚至在某些指标上超越它们，展现出其在处理大规模语言任务时的卓越能力。

TTT 架构的出现，预示着人工智能领域可能迎来一次全新的革命，有望在文本生成、自然语言理解、机器翻译等多个应用场景中带来突破性进展。随着这一研究的深入和应用，TTT 架构的潜力将进一步得到挖掘，为未来人工智能的发展开辟新的道路。

英语如下：

News Title: “New TTT Architecture Outpaces Transformers, Marking Revolutionary Leap in Large Model Performance”

Keywords: Language Model, TTT Architecture, Performance Enhancement

News Content: In a significant breakthrough in the AI domain, a novel language model architecture, Test-Time Training (TTT), is challenging the status quo. This architecture, developed by leading researchers from Stanford University, University of California, Berkeley, University of California, San Diego, and Meta, is set to revolutionize the way language models are built. The emergence of TTT signifies a milestone in the performance enhancement of large models ranging from 125M to 1.3B parameters, garnering widespread attention in the industry.

The core innovation of the TTT architecture lies in replacing the hidden state of traditional Recurrent Neural Networks (RNNs) with machine learning models, which significantly optimizes the model’s computational efficiency and performance. By compressing context through the gradient descent of input tokens, the TTT architecture not only reduces complexity but also maintains or surpasses the performance of current leading models like Transformers and Mamba, making it a formidable contender.

Karan Dalal, one of the lead authors of the paper, hailed this development as a fundamental shift in the approach to language model research. In a series of experiments comparing models at various parameter scales from 125M to 1.3B, the TTT-Linear and TTT-MLP models have not only matched the capabilities of the current most powerful Transformer and Mamba architectures but in some metrics, even outperformed them, demonstrating their exceptional ability to handle large-scale language tasks.

The arrival of the TTT architecture heralds a potential new revolution in the AI domain, promising game-changing advancements in text generation, natural language understanding, machine translation, and other application areas. As this research is further explored and applied, the potential of the TTT architecture will be more fully realized, paving new paths for the future development of artificial intelligence.

【来源】https://www.jiqizhixin.com/articles/2024-07-10-2