苹果公司研究人员近日推出了一种名为自回归视觉模型(AIM)的新技术。在一篇题为《Scalable Pre-training of Large Autoregressive Image Models》的论文中,苹果研究者探讨了使用自回归目标训练Vision Transformer(ViT)模型是否能够在图像表征学习方面获得与大型语言模型(Large Language Models,LLMs)相同的扩展能力。
研究结果显示,该模型容量可以轻松扩展到数十亿个参数,并且AIM能够有效利用大量未经整理的图像数据。这一发现挑战了传统图像表征学习的局限性,为图像处理领域带来了新的可能性。
据悉,自回归视觉模型(AIM)是苹果公司研发的一种新型图像模型,采用了自回归目标训练ViT模型,使其在学习图像表征时具有与LLMs相同的扩展能力。这一技术有望在图像识别、生成和理解等领域发挥重要作用。
英文标题:Apple Releases Autoregressive Visual Model AIM, Challenging Traditional Image Representation Learning
英文关键词:Apple, Autoregressive Visual Model, Image Representation Learning
英文新闻内容:
Apple researchers have recently introduced a new technology called Autoregressive Visual Model (AIM). In a paper titled “Scalable Pre-training of Large Autoregressive Image Models,” the Apple researchers explore whether using autoregressive objectives to train Vision Transformer (ViT) models can achieve the same scalability in image representation learning as Large Language Models (LLMs).
The research results show that the model capacity can easily expand to billions of parameters, and AIM can effectively utilize a large amount of unstructured image data. This challenges the limitations of traditional image representation learning and brings new possibilities to the field of image processing.
It is understood that Autoregressive Visual Model (AIM) is a new image model developed by Apple, which uses autoregressive objectives to train ViT models, enabling them to have the same scalability as LLMs in learning image representation. This technology is expected to play an important role in fields such as image recognition, generation, and understanding.
【来源】https://www.jiqizhixin.com/articles/2024-01-18-7
Views: 1