Google’s Gemini 2.5 Pro Outperforms OpenAI in Key Reasoning Tests

作者智能小编

3 月 26, 2025 #每日AI快讯

Google has unveiled Gemini 2.5 Pro, the first member of its Gemini 2.5 family of thinking models, achieving top scores in multiple benchmarks and demonstrating a significant leap in reasoning capabilities compared to OpenAI’s models.

In a move that underscores the intensifying competition in the AI landscape, Google’s Gemini 2.5 Pro has emerged as a frontrunner, surpassing several well-known models, including OpenAI’s o3-mini, Claude 3.7 Sonnet, Grok-3, and DeepSeek-R1. The model achieved a score of 1443 on the Large Model Systems Organization (LMSYS) Arena, a widely recognized benchmark, securing a decisive first place with a 39-point lead.

According to a report by Chinese tech media outlet Zhidxing, Gemini 2.5 Pro also demonstrated superior performance in the challenging Humanity’s Last Exam benchmark, achieving a nearly 5% higher score than OpenAI’s o3-mini, representing a 34% improvement.

One of the key features of Gemini 2.5 Pro is its support for a 1 million token context window, which is expected to expand to 2 million tokens soon. This large context window allows the model to process and understand significantly more information, enabling it to perform more complex reasoning tasks.

Currently, Gemini 2.5 Pro is available to developers through Google AI Studio, and will soon be integrated into Google’s Vertex AI platform. Users with a Gemini Advanced subscription can also experience the new model. Google plans to announce pricing details in the coming weeks, allowing users to commercially utilize Gemini 2.5 Pro at scale with faster processing speeds.

While Google has not released benchmark comparisons between Gemini 2.5 Pro and OpenAI’s o1, o1-Pro, and o3 models, the available data suggests a significant advancement in Google’s AI capabilities. However, it’s worth noting that Gemini 2.5 Pro’s score on the SWE-bench verified, an intelligent agent programming assessment benchmark, is lower than that of Claude 3.7 Sonnet.

Despite this, Gemini 2.5 Pro’s overall performance across various benchmarks, including the LMSYS Arena and Humanity’s Last Exam, highlights its potential to revolutionize a wide range of applications, from programming and mathematics to science and general knowledge.

References:

Google AI Blog: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-pro
Gemini AI Studio: https://aistudio.google.com

>>> Read more <<<

智能新闻

一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Google’s Gemini 2.5 Pro Outperforms OpenAI in Key Reasoning Tests

作者智能小编

相关文章

Alimama & National Library Use AI to Promote Reading on World Book Day

阿里妈妈联手国家图书馆，AI创意短片致敬读书日

Zuoyebang Bolsters Security with Reasoning Models RAG and Agents.

发表回复取消回复

为您推荐

Alimama & National Library Use AI to Promote Reading on World Book Day

阿里妈妈联手国家图书馆，AI创意短片致敬读书日

Zuoyebang Bolsters Security with Reasoning Models RAG and Agents.

深度原理突破！React-OT革新化学反应过渡态检索

作者智能小编

相关文章

发表回复 取消回复

为您推荐

发表回复取消回复