智谱AI Unveils New Generation of AI Models Leading Global Performance at KDD 2024

Zhipu AI Unveils New Generation of Foundation Models, Leading the Packin Performance

Beijing, China – Zhipu AI, a leadingartificial intelligence research and development company, has announced the release of a new generation of foundation models, showcasing significant advancements in performance and modality. These models, unveiled atthe KDD 2024 conference, include the language model GLM-4-Plus, the text-to-image model CogView-3-Plus, the image/video understanding model GLM-4V-Plus, and the video generation model CogVideoX. Each model has achieved top-tier performance in its respective field, solidifying Zhipu AI’s positionas a global leader in AI innovation.

GLM-4-Plus: A New Benchmark in Language Understanding

Zhipu AI’s latest language model, GLM-4-Plus, demonstrates significant improvements in language understanding, instruction following, and long-text processing. The model has been trained on a massive dataset of high-quality data, leveraging techniques like PPO to enhance its reasoning and instruction-following capabilities. GLM-4-Plus achieves performance on par with leading models like GPT-4o, solidifying its position atthe forefront of language model development. The model’s ability to process long texts has been further enhanced through a refined strategy of combining short and long text data, allowing it to excel in complex reasoning tasks. GLM-4-Plus is now available on the Zhipu AI open platform (bigmodel.cn) through API services and will soon be accessible through the Qingyan APP.

CogView-3-Plus: Bridging the Gap in Text-to-Image Generation

In the realm of text-to-image generation, Zhipu AI’s CogView-3-Plus modelhas made significant strides. By replacing the traditional UNet architecture with a Transformer architecture for training the diffusion model, and by conducting in-depth research on noise scheduling, Zhipu AI has achieved a substantial optimization of model performance. The model’s scale-up efficiency has been validated, demonstrating the benefits of increasingmodel parameters. Furthermore, the development of a high-quality image fine-tuning dataset has enabled CogView-3-Plus to leverage its vast pre-trained knowledge base to generate images that are more aligned with user instructions and possess higher aesthetic scores. The model’s performance is now comparable to leading models likeMJ-V6 and FLUX, placing it among the top performers in the field. CogView-3-Plus is currently available on the open platform (bigmodel.cn) through API services and is integrated into the Qingyan APP.

GLM-4V-Plus: Unlocking thePower of Multimodal Understanding

Building upon the research experience gained from the CogVLM series of models, Zhipu AI has developed GLM-4V-Plus, a multimodal model that excels in both image and video understanding. GLM-4V-Plus not only comprehends and analyzes complexvideo content but also possesses exceptional temporal awareness. This model is now available on the open platform (bigmodel.cn) and is the first general-purpose image and video understanding model API in China.

Qingyan APP: Revolutionizing Communication with Video Calls

Leveraging its deep expertise in model development, Zhipu AI has introduced a video call feature to the Qingyan APP, marking the first C-end video call service in China. This feature transcends the boundaries of text, audio, and video modalities, incorporating real-time reasoning capabilities. Users can enjoy seamless interactions during Qingyan video calls,with the model responding promptly even when interrupted. Qingyan not only understands the visuals captured by the camera but also accurately interprets and executes user commands, providing an experience akin to a real-person video call. The video call feature of the Qingyan APP will be rolled out to a select group of userson August 30th, with external applications being accepted. Zhipu AI plans to continuously iterate and optimize the feature, gradually expanding its availability to achieve full coverage.

CogVideoX: Open-Sourcing Innovation

Zhipu AI remains committed to open-sourcing its most advancedmodels to empower a wider developer community and drive progress in the field of AI. Following the release and open-sourcing of the 2B version, the 5B version of CogVideoX is now also open-source, further enhancing its performance and solidifying its position as the best choice among open-source video generation models.

A Commitment to Continuous Innovation

Zhipu AI’s latest advancements in foundation models demonstrate its unwavering commitment to pushing the boundaries of AI technology. The company’s dedication to research, development, and open-sourcing is driving the development of more powerful and versatile AI solutionsthat are transforming industries and enhancing human capabilities. As Zhipu AI continues to innovate, the future of AI promises to be even more exciting and transformative.