NEWS 新闻NEWS 新闻

在人工智能领域,生成模型因其能够创建逼真的视觉内容而受到广泛关注。然而,构建这些模型往往面临着高昂的计算成本和对海量数据集的需求。过去,训练一个具有12亿参数的扩散模型可能需要耗费大量的资源和时间。不过,来自Sony AI等机构的研究者最近取得了一项突破性进展,他们仅用1890美元,就成功训练出了一个参数量达11.6亿的扩散模型,为大规模扩散模型的开发带来了全新的可能。

#### 技术创新与成本控制

这一成果的实现,得益于研究团队在训练过程中的创新策略和对成本的有效控制。他们通过开发一种低成本的端到端训练管道,显著降低了训练成本,使其比当前最先进的模型降低了多个数量级。这一突破性的技术不仅减少了训练所需的硬件资源,还优化了数据集的使用,仅需3700万张图像,即达到了与现有大型模型相当的性能。

#### 简化模型设计与高效计算

研究团队选择了基于视觉Transformer的潜在扩散模型进行文本到图像生成,这一选择不仅简化了模型设计,而且广泛应用于图像生成领域。通过在Transformer输入层实施随机掩蔽策略,他们成功地减少了计算开销,同时保持了模型的性能。更进一步,他们引入了一种称为“延迟掩蔽”的策略,通过在Transformer之前对图像进行预处理,从而在高掩蔽率下保持了模型的有效性能,且不增加额外的计算成本。

#### 结合最新架构进展

为了进一步提高大规模训练的性能,研究团队还整合了Transformer架构的最新进展,包括逐层缩放和使用MoE(多嵌入)的稀疏Transformer。这些创新不仅提高了模型的效率,还减少了实验开销,使得在相同的计算预算下,能够实现更优的性能。

#### 组合数据集与成本效益

值得注意的是,研究团队在训练数据集的选择上也展现了创新思维。他们不仅使用了真实图像,还考虑了在数据集中结合其他合成图像,以进一步降低成本和提高效率。在这样的组合数据集上,1890美元的成本就能训练出一个具有竞争力的模型,其FID(Frechet Inception Distance)得分在零样本生成任务中达到了12.7,与成本为28,400美元的最先进方法相比,成本仅为前者的1/15。

### 结论

Sony AI团队的这一研究成果不仅展示了技术创新在降低人工智能训练成本方面的巨大潜力,也为未来的模型开发提供了新的思路和方法。通过高效利用有限的资源,实现高性能模型的训练,这一进展有望推动人工智能技术在更多领域实现更广泛的应用,为行业带来革命性的变化。

英语如下:

### High Efficiency at a Low Cost: Sony AI Team Trains a 12-Billion Parameter Diffusion Model for $1,890

In the realm of artificial intelligence, generative models have garnered significant attention for their capability to produce realistic visual content. However, constructing these models often entails substantial computational costs and a requirement for vast datasets. Traditionally, the training of a model with 12 billion parameters might have necessitated a considerable amount of resources and time. Yet, researchers from Sony AI and other institutions have recently made a breakthrough, successfully training a diffusion model with 11.6 billion parameters for just $1,890. This achievement opens new possibilities for the development of large-scale diffusion models.

#### Innovation in Training and Cost Management

This accomplishment was realized through the team’s innovative strategies and effective cost control during the training process. By developing a low-cost, end-to-end training pipeline, they significantly reduced the training costs, making them several orders of magnitude lower than the most advanced models. This breakthrough not only minimized the hardware resources required for training but also optimized the use of datasets, achieving performance comparable to existing large models with only 37 million images.

#### Simplified Model Design and Efficient Computation

The team chose a visual Transformer-based potential diffusion model for text-to-image generation, which simplified model design and has wide applications in the field of image generation. They implemented a random masking strategy in the Transformer input layer to reduce computational overhead while maintaining model performance. Furthermore, they introduced a “delayed masking” technique, preprocessing images before the Transformer, which maintained effective model performance at high masking rates without increasing additional computational costs.

#### Integration of Latest Architectural Advances

To further enhance the performance of large-scale training, the team also incorporated the latest advancements in Transformer architecture, including layer-wise scaling and the use of MoE (Multi-Embedding) in sparse Transformers. These innovations not only improved model efficiency but also reduced experimental expenses, enabling superior performance within the same computational budget.

#### Combining Datasets and Cost-Effectiveness

Notably, the team demonstrated innovative thinking in the selection of training datasets. They not only utilized real images but also considered incorporating other synthetic images into the datasets to further reduce costs and improve efficiency. On such combined datasets, the $1,890 cost was sufficient to train a competitive model with an FID (Frechet Inception Distance) score of 12.7 in zero-shot generation tasks, a performance that is 1/15th of the cost of the most advanced method requiring $28,400.

### Conclusion

The Sony AI team’s research outcome showcases the potential of technological innovation in reducing the costs associated with AI training, offering new approaches and methodologies for future model development. By efficiently leveraging limited resources to achieve high-performance model training, this advancement is poised to drive revolutionary changes in the application of AI technology across various fields, paving the way for its broader implementation in industry.

【来源】https://www.jiqizhixin.com/articles/2024-07-29-4

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注