北大新突破：AI更懂提示词，人物交互图像更真实

正文：

近日，北京大学王选计算机研究所的MIPL实验室发表了一篇关于人物交互图像生成框架的研究论文，该框架名为SA-HOI，旨在通过利用人体姿态和交互语义信息，提高人物交互图像生成的质量和真实性。

研究人员提出了一种名为“姿态和交互感知”的生成框架，该框架通过结合图像反演的方法，实现了迭代式反演和图像修正流程，使生成图像能够逐步自我修正，提升质量。此外，该团队还提出一个全面的人物交互图像生成基准，并设计了针对性的评估指标，以全面测评生成图像的质量。

在论文中，研究人员详细介绍了SA-HOI方法框架，并展示了姿态和交互指导的设计，以及迭代反演和修正流程的实现。他们首先使用稳定扩散模型生成初始图像，然后利用姿态检测器获取人体关节位置，构建姿态掩码以高亮低质量姿态区域。同时，利用分割模型定位交互边界区域，构建交互掩码以增强交互边界的语义表达。在每个去噪步骤中，结合姿态和交互掩码作为约束，对图像进行修正，从而减少生成问题。

大量实验结果表明，SA-HOI方法在针对人物交互图像生成的评估指标和常规图像生成的评估指标下均优于现有的基于扩散的图像生成方法。该研究不仅推动了图像生成领域的进展，也为人工智能在视觉内容创作和虚拟现实等领域的发展提供了新的思路和技术支持。

据了解，MIPL实验室近年来在多个顶级计算机视觉会议上发表了多项研究成果，并在国内外多个重量级竞赛中获得冠军。此次推出的SA-HOI框架，有望成为该领域的重要里程碑，为后续研究和技术应用提供新的方向。

英语如下：

News Title: Peking University Makes New Breakthrough in AI Understanding of Prompts, Enhancing Realistic Character Interaction Images

Keywords: Peking University AI, Image Generation, Semantic Perception

News Content:

Title: Peking University Launches New Framework to Improve Quality of Character Interaction Image Generation

Recent breakthroughs by the MIPL Lab at the Wang Xuan Computer Institute of Peking University have been published in a study on a character interaction image generation framework known as SA-HOI. The aim of this framework is to enhance the quality and realism of character interaction image generation by leveraging body pose and interaction semantic information.

The research team has proposed a generative framework called “Pose and Interaction Perception,” which utilizes an iterative inversion and image correction process through a combination of image inversion methods. This allows the generated images to self-correct iteratively, improving their quality. Additionally, the team has introduced a comprehensive benchmark for character interaction image generation and designed specific evaluation metrics to comprehensively assess the quality of generated images.

In the paper, the researchers provide a detailed overview of the SA-HOI method framework and demonstrate the design of pose and interaction guidance, as well as the implementation of the iterative inversion and correction process. They start by generating initial images using a stable diffusion model and then use a pose detector to determine the locations of body joints, constructing pose masks to highlight areas of low-quality pose. Simultaneously, they use a segmentation model to identify interaction boundary areas, constructing interaction masks to enhance the semantic expression of interaction boundaries. At each denoising step, the framework combines pose and interaction masks as constraints to correct the image, thus reducing generation issues.

Extensive experimental results show that the SA-HOI method outperforms existing diffusion-based image generation methods in both character interaction image generation evaluation metrics and conventional image generation evaluation metrics. This research not only advances the field of image generation but also offers new ideas and technical support for the development of artificial intelligence in the areas of visual content creation and virtual reality.

It is understood that the MIPL Lab has published several research achievements in top-tier computer vision conferences in recent years and has won championships in various heavyweight competitions both domestically and internationally. The launch of the SA-HOI framework is expected to become a significant milestone in the field, providing new directions for future research and technology applications.

【来源】https://www.jiqizhixin.com/articles/2024-08-08-4