在上海浦东滨江公园观赏外滩建筑群-20240824在上海浦东滨江公园观赏外滩建筑群-20240824

Revolutionizing 3D Asset Generation: Researchers Introduce Native 3D Universal Framework, LN3Diff, at ECCV 2024

In a groundbreaking development at the European Conference on Computer Vision (ECCV) 2024, researchers from the S-Lab at Nanyang Technological University (NTU), Shanghai AI Lab, and Peking University have presented a novel native 3D Latent Diffusion Model (LDM) generation framework. This cutting-edge technology addresses the limitations of existing 3D native generation models, offering improved scalability, efficiency, and generalization.

The AIxiv column, a dedicated platform by machine之心 for publishing academic and technical content, has been instrumental in reporting on over 2,000 articles, covering top laboratories from prestigious universities and corporations worldwide. The platform has significantly facilitated academic exchange and dissemination. Researchers with exceptional work to share are encouraged to submit their papers or contact the editorial team for coverage. Submission emails can be directed to liyazhou@jiqizhixin.com or zhaoyunfeng@jiqizhixin.com.

The lead author of the groundbreaking study, Lan Yushi, is a Ph.D. student at NTU, supervised by Professor Chen Change Loy. With an undergraduate degree from Beijing University of Posts and Telecommunications, Lan’s research interests lie in 3D generation models based on neural rendering, 3D reconstruction, and editing.

The new framework, dubbed Latent Neural Fields 3D Diffusion (LN3Diff), tackles the challenges of poor scalability, low training efficiency, and inadequate generalization in current native 3D generation models. By leveraging a two-stage approach combining 3D VAE (Variational Autoencoder) and 3D-DiT (Diffusion Transformer), LN3Diff presents a universal 3D generation solution applicable to any Neural Field. The method has undergone extensive training on the Objaverse dataset and has demonstrated exceptional performance across multiple benchmark tests, while also boasting faster inference speeds.

For more information on the research, readers can visit the paper’s project homepage and access the code on GitHub. A Gradio demo is also available for interactive exploration. Lan Yushi’s personal website, https://nirvanalan.github.io/, offers additional insights into his work.

The paper, titled LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation, emerges against the backdrop of rapid advancements in Neural Rendering, a field that has seen significant progress in recent years. Techniques centered on differentiable rendering and generative models have yielded impressive results in novel view synthesis, 3D editing, and 3D object generation. However, unlike the unified image/video generation LDM frameworks, native 3D generation models based on diffusion models still lack a universal approach.

Current methods, such as Shape from Depths (SDS) distillation, face limitations due to optimization duration and saturation issues, while two-stage methods relying on multi-view generation and feedforward reconstruction struggle with the quality and diversity of the generated views. These constraints have hindered the performance and flexibility of 3D Artistic Intelligence Generation (AIGC).

The LN3Diff framework breaks new ground by introducing a native LDM-based 3D generation approach. By conducting diffusion sampling directly in the 3D latent space, it enables efficient and high-quality 3D asset creation. The method consists of a 3D-aware VAE that efficiently compresses information, allowing for better compatibility with the 3D modality.

In the encoding stage, the 3D VAE takes multi-view images of 3D objects as input, preserving texture modeling capabilities and leveraging the structure of 2D image encoders. The model incorporates multi-view images, corresponding depth maps, and Plucker camera information, executing 3D-aware attention operations in the token space for enhanced 3D consistency.

The decoder, focused on optimizing information compression, employs a 3D-aware mechanism to reconstruct the 3D neural field. This innovation paves the way for a more versatile, efficient, and controlled 3D generation process, positioning LN3Diff as a promising solution for the future of 3D content creation.

As the field of computer vision continues to evolve, advancements like LN3Diff are set to transform the way 3D assets are generated, opening up new possibilities for applications in gaming, architecture, product design, and beyond. The impact of this breakthrough is likely to be far-reaching, fostering further innovation in the realm of 3D modeling and rendering.

【source】https://www.jiqizhixin.com/articles/2024-08-26-6

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注