AI画家难题：为何冰可乐难住茶杯？

人工智能图像生成技术近年来取得了显著的进步，但仍然存在一些挑战。其中，一个有趣的案例是“茶杯中的冰可乐”，这个看似简单的创意却给AI画家带来了“滑铁卢”。AI虽然能够理解并描绘出单独的物体，但在合成一个不常见的组合时却遇到了难题。例如，即使是最先进的AI模型，如Dall-E 3，也难以正确生成“茶杯中的冰可乐”的图像。

上海交通大学王德泉教授的团队对此进行了深入研究，并在即将举行的欧洲计算机视觉大会上发表了一篇论文。他们发现，AI在生成图像时存在“文本图像不对齐”的问题，即AI无法理解人类语言中的隐含概念，这导致了图像中的茶杯被透明玻璃杯所替代。

为了解决这个问题，研究人员设计了一个基于大语言模型的系统，以帮助收集类似“茶杯中的冰可乐”的问题。他们向AI解释了问题的本质，并让AI生成更多的概念对。随后，研究人员使用文生图模型来绘制图像，并采用人工评估的方式来判断AI的表现。

然而，现有的自动化评价指标在处理“茶杯中的冰可乐”这类问题时并不适用，因此研究人员不得不依赖人工评估。他们发现，即使是最高级的AI模型，也无法在20张图像中正确生成出所有概念对。

为了解决这一问题，研究人员提出了一种名为“概念专家混合”的方法，旨在提高AI在处理隐含概念时的准确率。这种方法有望在未来的人工智能图像生成领域中发挥重要作用，帮助AI更好地理解和生成人类所期望的图像。

英语如下：

News Title: “AI Painter’s Dilemma: Why Does Iced Coke Stump a Teacup?”

Keywords: AI artist, Text-to-Image Misalignment, Research by Shanghai Jiao Tong University

News Content:
In recent years, artificial intelligence (AI) image generation technology has made significant strides, but it still faces challenges. One intriguing case is the “iced coke in a teacup,” a seemingly simple concept that has proven to be a “Waterloo” for AI painters. While AI can understand and depict individual objects, it struggles when it comes to combining them in an unusual way. For instance, even the most advanced AI models, such as Dall-E 3, have difficulty generating images of “iced coke in a teacup.”

Professor Wang Dequan’s team from Shanghai Jiao Tong University conducted a thorough study and will present a paper at the upcoming European Conference on Computer Vision. They discovered that AI suffers from “text-to-image misalignment,” where AI fails to grasp the implied concepts in human language, leading to transparent glass cups instead of teacups in the images.

To address this issue, the researchers designed a system based on large language models to help collect questions like “iced coke in a teacup.” They explained the essence of the problem to the AI and asked it to generate more conceptual pairs. Then, the researchers used text-to-image models to draw images and relied on manual assessment to evaluate the AI’s performance.

However, existing automated evaluation metrics are not applicable to problems like “iced coke in a teacup,” so the researchers had to depend on manual assessment. They found that even the top AI models cannot correctly generate all conceptual pairs in 20 images.

To solve this problem, the researchers proposed a method called “Concept Expert Mixture,” aimed at improving AI’s accuracy in handling implied concepts. This approach is expected to play a significant role in the future of AI image generation, helping AI better understand and produce the images humans desire.

【来源】https://www.jiqizhixin.com/articles/2024-08-06-11