正文:
近年来,随着人工智能技术的飞速发展,大型语言模型(LLM)在处理自然语言任务方面展现出惊人的能力。然而,这些模型在执行复杂推理时仍然存在局限性,特别是在没有经过微调或使用更强大模型辅助的情况下。为了解决这一问题,微软亚洲研究院和哈佛大学的研究团队开发了一种名为rStar的新算法,旨在无需微调或依赖更强大模型的情况下,提升小型语言模型(SLM)的推理能力。

rStar算法的核心思想是利用两个小型语言模型之间的相互验证机制,类似于两个学习者互相检查考卷答案,以提高得分。该算法将推理过程分为两个阶段:答案生成和相互验证。在答案生成阶段,SLM被训练生成多样化的推理步骤,以探索不同的解答空间。在相互验证阶段,一个SLM作为判别器,与生成推理轨迹的SLM相互验证,确保推理轨迹的正确性。

为了解决小型语言模型在推理过程中的探索问题,rStar算法构建了一个包含丰富类人推理动作的数据集,确保SLM能够探索多种不同的推理任务空间。同时,该算法还设计了一个专门针对SLM的奖励函数,以评估推理过程中的中间步骤,避免依赖SLM自我评估的可靠性。

目前,rStar算法已经在多个推理任务上取得了显著的成果,证明了其在提升小型语言模型推理能力方面的有效性。这项研究的发表,为人工智能领域提供了一种新的解决方案,有望推动小型语言模型的广泛应用,并在未来为更多的自然语言处理任务提供强大的支持。

英语如下:

News Title: “Small Model Breakthrough: Solving Complex Problems Without CoT Fine-tuning”

Keywords: Small Model, Validation, Large Problem

News Content:
Title: Microsoft Asia Research Institute Collaborates with Harvard University to Launch New Algorithm rStar to Enhance Small Language Model Reasoning Ability

Body:
In recent years, with the rapid development of artificial intelligence technology, large language models (LLM) have shown astonishing capabilities in handling natural language tasks. However, these models still have limitations in complex reasoning, especially when not fine-tuned or without the assistance of more powerful models. To address this issue, a research team from Microsoft Asia Research Institute and Harvard University has developed a new algorithm named rStar, aiming to enhance the reasoning ability of small language models (SLM) without fine-tuning or relying on stronger models.

The core idea of the rStar algorithm is to utilize a mutual validation mechanism between two small language models, similar to two learners checking each other’s test answers to improve their scores. The algorithm divides the reasoning process into two stages: answer generation and mutual validation. In the answer generation stage, the SLM is trained to generate diverse reasoning steps to explore different solution spaces. In the mutual validation stage, one SLM acts as a discriminator, validating the reasoning trajectories generated by another SLM to ensure the correctness of the trajectories.

To address the exploration problem in the reasoning process of small language models, the rStar algorithm constructs a dataset rich in human-like reasoning actions, ensuring that SLMs can explore various different reasoning task spaces. Additionally, the algorithm designs a reward function specifically for SLMs to evaluate the intermediate steps of the reasoning process, avoiding the reliability of self-evaluation by SLMs.

So far, the rStar algorithm has achieved significant results on various reasoning tasks, proving its effectiveness in enhancing the reasoning ability of small language models. The publication of this research offers a new solution to the artificial intelligence field, which is expected to promote the widespread application of small language models and provide powerful support for more natural language processing tasks in the future.

【来源】https://www.jiqizhixin.com/articles/2024-08-16

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注