Zhipu AI has once again made headlines with the launch of a series of next-generation foundation models, each boasting performance metrics that place them at the forefront of international standards. At the KDD 2024 conference, the company showcased its latest advancements, including the language model GLM-4-Plus, the image-to-text model CogView-3-Plus, the image/video understanding model GLM-4V-Plus, and the video generation model CogVideoX. These models have set new benchmarks in their respective domains, marking a significant milestone for Zhipu AI.
A Leap Forward in AI Technology
Since the release of the first-generation language foundation model ChatGLM in March 2023, Zhipu AI has been committed to deepening its research into foundation models. The result is a series of cutting-edge models that have been meticulously developed to enhance performance and modal capabilities.
GLM-4-Plus: A Breakthrough in Language Understanding
The GLM-4-Plus model stands out for its comprehensive improvements in language understanding, instruction following, and long-text processing. Through extensive theoretical research and the construction of massive high-quality datasets, Zhipu AI has leveraged techniques like PPO to enhance the model’s reasoning and instruction adherence. The GLM-4-Plus model now matches the performance of top-tier models like GPT-4o, maintaining its position at the international forefront.
CogView-3-Plus: Pushing the Boundaries of Image Generation
The CogView-3-Plus model has revolutionized image generation by replacing the traditional UNet architecture with a Transformer-based approach. This, combined with an in-depth study of noise scheduling in diffusion models, has significantly improved the model’s performance. The model’s ability to generate images that align closely with instructions and have high aesthetic scores has been validated, placing it on par with leading models like MJ-V6 and FLUX.
GLM-4V-Plus: A Multimodal Marvel
Building on the success of the CogVLM series, the GLM-4V-Plus model has been developed to excel in both image and video understanding. Its superior image comprehension and temporal awareness make it the first general-purpose image & video understanding model API available in China.
Innovative Features and Services
In addition to the advanced models, Zhipu AI has also introduced several new features and services aimed at enhancing user experience and accessibility.
QingYan App: Video Calling for the Masses
The QingYan app has been upgraded to include a video calling feature, marking the first C-end user-oriented video calling service in China. This feature seamlessly integrates text, audio, and video modalities, providing real-time reasoning capabilities. Users can enjoy smooth interactions, with the app quickly responding to interruptions and accurately executing commands.
GLM-4-Flash API: Free and Accessible
Zhipu AI has made the GLM-4-Flash API available for free, enabling users to rapidly and cost-effectively build custom models and applications. This API, which offers significant advantages in terms of speed and performance, is the first completely free large model API on Zhipu AI’s open platform (bigmodel.cn).
Open Source: CogVideoX
True to its commitment to open-source innovation, Zhipu AI has released the CogVideoX-5B model, a more advanced version of the previously open-sourced CogVideoX-2B. The company has also adjusted the open-source protocol for CogVideoX-2B to the more open Apache 2.0, contributing to the繁荣 development of the AI video generation community.
Conclusion
Zhipu AI’s latest achievements at KDD 2024 are a testament to its dedication to advancing AI technology. With these next-generation foundation models, the company continues to lead the way in innovation, providing users with cutting-edge tools and services that push the boundaries of what is possible in AI. As Zhipu AI moves forward with its mission to make machines think like humans, the future looks promising for the AI industry.
Views: 0