Transformer之父齐聚GTC，共话超越Transform

在今年的全球图形技术大会（GTC）上，英伟达创始人黄仁勋带来了一场别开生面的圆桌论坛，邀请了Transformer模型的六大原始作者共襄盛举，遗憾的是，Niki Parmar因故未能出席。这是Transformer论文作者们首次集体在公众面前亮相，引发业界广泛关注。

在对话中，作者们表达了对Transformer未来发展的一些深思熟虑的观点。他们坦诚，尽管Transformer已经取得了显著成就，但他们期待有更先进的技术能超越Transformer，引领人工智能的性能达到新的高峰。Transformer最初的设计目标是模拟Token的动态演化，而不仅仅是简单的线性生成，旨在捕捉文本和代码的复杂变化。

讨论中，作者们也指出，当前的大型模型在处理像“2+2”这样的简单问题时，可能会动用到万亿参数，这引发了对计算资源效率的思考。他们认为，未来的趋势应该是自适应计算，即根据问题的复杂性智能调整计算资源的投入。

此外，他们还提到，目前的模型训练成本相对低廉，大约1美元能处理百万个Token，这比购买一本平装书便宜了100倍。这一现象反映出AI模型的经济性与普及性，但同时也暗示了规模扩展的可能性和必要性。

这场论坛不仅是对Transformer模型历史性的回顾，更是对未来人工智能发展方向的前瞻讨论，预示着计算效率和模型性能将是未来研究的重要课题。来源：腾讯科技。

英语如下：

News Title: “Transformer Pioneers Gather at GTC to Discuss the Next Chapter Beyond Transformers”

Keywords: Transformer creators, Huang Renxun forum, large model evolution

News Content: At this year’s Global Graphics Technology Conference (GTC), NVIDIA founder Huang Renxun hosted a groundbreaking roundtable, assembling the six original authors of the Transformer model, with the exception of Niki Parmar, who was unable to attend due to unforeseen circumstances. This marked the first public appearance of the Transformer paper authors as a group, attracting widespread attention in the industry.

During the dialogue, the authors shared their well-considered insights into the future of Transformer. While acknowledging its remarkable achievements, they expressed anticipation for advanced technologies that could surpass Transformer, propelling AI performance to new heights. Initially, Transformer was designed to dynamically evolve tokens, going beyond linear generation to capture the complexities in text and code.

The authors also pointed out that current large models might utilize trillions of parameters to solve simple problems like “2+2,” raising questions about computational efficiency. They envision adaptive computing as the future trend, where computational resources are intelligently adjusted according to the complexity of the problem.

Additionally, they mentioned that the cost of training models is relatively low, with approximately $1 USD processing a million tokens – a fraction of the cost of a paperback book. This highlights the affordability and普及ity of AI models but also suggests the potential and necessity for scalability.

The forum not only served as a historic retrospective of the Transformer model but also fostered a forward-looking discussion on the future direction of AI, emphasizing computational efficiency and model performance as key research areas. Source: Tencent Technology.

【来源】https://new.qq.com/rain/a/20240321A00W5H00