“Ali’s Qwen2-Math Shatters Records Tops GPT-4o & Claude-3.5 in Math Puzzles

作者智能小编

9 月 5, 2024 #math, #每日AI快讯

Based on the information provided, it appears that Alibaba has open-sourced a new series of mathematical reasoning models called Qwen2-Math, which are designed to solve mathematical problems and have been ranked as the global leaders in mathematical reasoning. Here is a summary of the key details:

Qwen2-Math Series: The series includes Qwen2-Math-1.5B, Qwen2-Math-7B, and Qwen2-Math-72B models, all built on the Qwen2 LLM framework.
Performance: The largest model in the series, Qwen2-Math-72B-Instruct, has outperformed other advanced models such as GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro, and Llama-3.1-405B in mathematical reasoning tasks.
Evaluation: The models were evaluated on several mathematical benchmarks, including GSM8K, Math, MMLU-STEM, and specific Chinese benchmarks like CMATH, GaoKao Math Cloze, and GaoKao Math QA.
Training: The base models were initialized with Qwen2-1.5B/7B/72B and further pre-trained on a specialized mathematical corpus consisting of web texts, books, code, exam questions, and math pre-training data generated by the Qwen2 model.
Instruction Tuning: A reward model specific to mathematics was trained, and the model was further optimized using GRPO based on the reward signals.
Capabilities: Qwen2-Math has demonstrated the ability to solve some simple competition-level math problems, including multiple International Mathematical Olympiad (IMO) problems.

The information also includes a detailed case study where Qwen2-Math-72B-Instruct provides a solution to an IMO Shortlist problem from 2002, involving finding the smallest positive integer ( t ) such that the sum of ( t ) cubes equals ( 2002^{2002} ).

The release of Qwen2-Math highlights Alibaba’s commitment to advancing AI capabilities in mathematical reasoning and problem-solving, potentially contributing to scientific research and education by addressing complex mathematical challenges.