上海的陆家嘴

苹果和 UCSB 联合开源图片编辑框架 MGIE,提升图像编辑效率

近日,苹果公司和加州大学圣塔芭芭拉分校(UCSB)的研究人员联合开源了图片编辑框架 MGIE(MLLM-Guided Image Editing)。该框架将多模态大模型 MLLM 应用于图片编辑,解决指令引导不足的问题。

MGIE 框架的核心是利用 MLLM 学习简明的表达指令,并提供明确的视觉相关引导。通过端到端的训练,扩散模型会同步更新,并利用预期目标的潜在想象力执行图像编辑。

在人类指令的引导下,MGIE 可以进行 Photoshop 风格的修改、全局照片优化和局部对象修改。例如,用户可以输入“将图像中的天空替换为日落”或“将图像中的人替换为一只猫”,MGIE 框架将自动生成符合要求的编辑图像。

与传统的图像编辑工具相比,MGIE 框架具有以下优势:

* 指令引导更简便:MGIE 框架通过 MLLM 学习简明的表达指令,用户无需掌握复杂的图像编辑技术。
* 视觉相关性更强:MGIE 框架提供的视觉相关引导,使图像编辑更加精准和符合用户的意图。
* 端到端训练:MGIE 框架采用端到端的训练方式,扩散模型会同步更新,确保图像编辑的质量和效率。

MGIE 框架的开源将极大地促进图像编辑领域的研究和应用。研究人员和开发者可以基于该框架开发新的图像编辑工具和技术,提升图像编辑的效率和创造力。

苹果公司和 UCSB 的研究人员表示,他们希望 MGIE 框架能够为图像编辑领域带来新的突破,并为用户提供更加便捷和强大的图像编辑体验。

英语如下:

**Headline: Apple and UCSB Release MGIE: AI-Powered Image EditingEnters a New Era**

**Keywords:** Image Editing, AI-Guided, End-to-End

**News Content:** Apple and UCSB Collaborate to Open-Source Image Editing Framework MGIE, Enhancing Image Editing Efficiency

Recently, researchers from Apple and the University of California, Santa Barbara (UCSB) have jointly open-sourced an image editing framework called MGIE (MLLM-Guided Image Editing). This framework applies the multimodal large model MLLM toimage editing, addressing the issue of insufficient instruction guidance.

At the core of the MGIE framework is the use of MLLM to learn concisely expressed instructions and provide clear visual-semantic guidance. Through end-to-end training, the diffusion model is updated synchronously and performs image editing guided by the latent imagination of the desired target.

Guided by human instructions, MGIE can perform Photoshop-style editing, global photo enhancements, and local object manipulations. For example, users can input “Replace the sky in the image with a sunset” or “Replace the person in the image with a cat,” and the MGIE frameworkwill automatically generate edited images that meet the requirements.

Compared with traditional image editing tools, the MGIE framework has the following advantages:

* **Easier Instruction Guidance:** The MGIE framework learns concisely expressed instructions through MLLM, eliminating the need for users to master complex image editing techniques.
* **Stronger Visual Relevance:** The visual-semantic guidance provided by the MGIE framework makes image editing more precise and in line with user intent.
* **End-to-End Training:** The MGIE framework adopts an end-to-end training approach, where the diffusion model is updated synchronously, ensuring the quality and efficiency of image editing.

The open-sourcing of the MGIE framework will greatly promote research and applications in the field of image editing. Researchers and developers can build new image editing tools and techniques based on this framework, enhancing the efficiency and creativity of image editing.

Researchers from Apple and UCSB expressed their hope that the MGIE framework will bring new breakthroughs to the field of image editing and provide users with a more convenient and powerful image editing experience.

【来源】https://www.jiqizhixin.com/articles/2024-02-05-10

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注