Alibaba Open-Sources New Generation of Qwen2 Large Language Model
Hangzhou, China – Alibaba Cloud has announced the open-sourcing of itsnew generation of Qwen2 large language models (LLMs), marking a significant advancement in the field of artificial intelligence. The Qwen2 series, developed byAlibaba’s Tongyi Qianwen team, comprises five models ranging in size from 0.5B to 72B parameters, offering a diverse rangeof capabilities for various applications.
Enhanced Capabilities and Performance
Qwen2 models boast significant improvements in natural language understanding, code writing, mathematical problem-solving, and multilingual processing. The models have been trained on a massive dataset thatincludes high-quality data in Chinese, English, and 27 other languages, leading to a substantial boost in their overall performance.
One of the key highlights of Qwen2 is its ability to handle long contexts, with the largestmodel, Qwen2-72B-Instruct, supporting up to 128K tokens. This extended context length enables the model to process and understand complex information from lengthy documents or conversations, significantly enhancing its potential for various applications.
Benchmarking and Comparison
Qwen2 models have beenrigorously tested on multiple benchmark datasets, demonstrating their superior performance compared to other leading LLMs. Notably, the Qwen2-72B model surpasses Meta’s Llama-3-70B and Qwen1.5’s 110B model in several key areas, including natural language understanding,knowledge, code, mathematics, and multilingual capabilities.
In 16 benchmark tests, Qwen2-72B-Instruct achieved a remarkable balance between foundational capabilities and alignment with human values, surpassing Qwen1.5’s 72B model and rivaling Llama-3-70B-Instruct.
Specific Strengths and Improvements
Qwen2 exhibits significant strengths in code and mathematics, integrating learnings from the CodeQwen1.5 model and achieving improved performance across various programming languages. Its mathematical capabilities have been enhanced through the use of large-scale, high-quality data, leading toa leap in problem-solving abilities.
The Instruct models within the Qwen2 series have been trained on 32k context length and further extended to handle even longer contexts using techniques like YARN. This allows Qwen2-72B-Instruct to flawlessly handle information extraction tasks involving up to128k tokens.
Safety and Security
In terms of safety, Qwen2-72B-Instruct demonstrates comparable performance to GPT-4 in the category of multilingual unsafe queries. It significantly outperforms Mistral-8x22B, reducing the likelihood of generating harmful responses.
Multilingual Proficiency
Qwen2 excels in multilingual evaluations, showcasing enhanced capabilities in 27 languages. The models have been optimized to address language translation issues, minimizing the occurrence of language switching errors.
Open-Source Availability
The Qwen2 series is now available on HuggingFace and ModelScope platforms, enabling researchers and developers to access and utilize these powerful models for their own projects. This open-source release fosters collaboration and innovation within the AI community, promoting the development of new and exciting applications.
Conclusion
Alibaba’s open-sourcing of the Qwen2 seriesrepresents a significant contribution to the advancement of large language models. With its enhanced capabilities, improved performance, and open-source availability, Qwen2 is poised to play a pivotal role in driving innovation across various sectors, from natural language processing and code generation to multilingual applications and beyond.
【source】https://ai-bot.cn/qwen2/
Views: 0