Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

川普在美国宾州巴特勒的一次演讲中遇刺_20240714川普在美国宾州巴特勒的一次演讲中遇刺_20240714
0

Zhihu AI Open-Sources High-Performance Text-to-Image Model:CogView3-Plus-3B

A leap forward in open-source AI image generation.

Zhihu AI, a leading artificial intelligence research institute, has announced the open-sourcing of its advanced text-to-imagegeneration model, CogView3-Plus-3B, under the permissive Apache 2.0 license. This release marks a significant contribution to the open-source community, offering researchers and developers access to a model that rivals commercially available, state-of-the-art solutions in both quality and efficiency.

CogView3-Plus-3B builds upon its predecessor, CogView3, which already demonstrated superior performance compared to other open-source models. CogView3, a cascaded diffusion model, generates images in three stages: initially creating a 512×512 low-resolution image, thenupscaling to 1024×1024, and finally to a high-resolution 2048×2048 image. In blind human evaluations, CogView3 outperformed the leading open-source text-to-image diffusion model, SDXL, by a remarkable77.0%, while requiring approximately one-tenth of the inference time. (See this paper for detailed methodology.)

CogView3-Plus-3B significantly enhances CogView3 by incorporating the innovative Diffusionwith Transformer (DiT) framework. This integration, coupled with Zero-SNR diffusion noise scheduling and a novel text-image joint attention mechanism, leads to substantial improvements in image quality and flexibility. Unlike the commonly used MMDiT architecture, CogView3-Plus-3B achieves this performance boost while maintainingefficiency in both training and inference, utilizing a latent dimension of 16 in its Variational Autoencoder (VAE). Furthermore, mixed-resolution training allows CogView3-Plus-3B to generate images at resolutions ranging from 512 to 2048 pixels, offering unparalleled versatility.

Benchmark tests indicate that CogView3-Plus-3B achieves performance on par with the leading commercial text-to-image models. This is a crucial development, as it democratizes access to cutting-edge AI image generation technology. The open-source nature of CogView3-Plus-3B fosters collaboration and innovation within the AI community, accelerating the development and refinement of text-to-image models.

The release of CogView3-Plus-3B underscores Zhihu AI’s commitment to open research and its dedication to advancing the field of artificial intelligence. The model’s superior performance,combined with its ease of access and permissive license, promises to significantly impact various applications, from creative content generation to scientific visualization. The future implications of this open-source contribution are vast and exciting, paving the way for further breakthroughs in the rapidly evolving landscape of AI-powered image synthesis.

References:

Zhihu AI. (Date of release). *CogView3-Plus-3B Model Release. [Link to Zhihu AI’s official announcement (if available)]
* Paper link: https://arxiv.org/abs/2403.05121

(Note: I have created this article based on the provided information. A link to Zhihu AI’s official announcement would enhance the article’s credibility and should be added if available.)


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注