90年代申花出租车司机夜晚在车内看文汇报90年代申花出租车司机夜晚在车内看文汇报

##AI自动设计智能体,数学提分25.9%,远超手工设计

**机器之心报道**

近年来,基础模型(FM)如 GPT 和 Claude 的崛起,为通用智能体的构建提供了强有力支持,它们被广泛应用于各种推理和规划任务。然而,现实世界中所需要的智能体往往并非单片模型查询,而是由多个组件组成的复合智能体系统。为了应对复杂任务,这些智能体还需要访问外部工具,例如搜索引擎、代码执行和数据库查询。

尽管现有研究已提出许多有效的智能体系统构建块,例如思维链规划和推理、记忆结构、工具使用和自我反思,但开发和组合这些构建块通常需要特定领域的手动调整,耗费大量人力和时间。

为了解决这一问题,来自不列颠哥伦比亚大学和非营利人工智能研究机构 Vector Institute 的研究者们提出了一个全新的研究领域——智能体系统的自动化设计(ADAS,Automated Design of Agentic Systems)。他们开发了一种名为“元智能体搜索(Meta Agent Search)”的简单但有效的 ADAS 算法,旨在通过代码编程自动创建强大而新颖的智能体系统设计。

**ADAS 算法的关键组成部分包括:**

* **搜索空间:** 定义了 ADAS 中可以被表征和发现的智能体系统范围。
* **搜索算法:** 定义了 ADAS 算法如何探索搜索空间,需要考虑探索与利用的权衡。
* **评估函数:** 定义了如何评估候选智能体的性能、成本、延迟或安全性等指标。

**元智能体搜索的核心概念是指示元智能体迭代地创建新的智能体,评估它们,并将它们添加到搜索空间中。**

研究人员通过实验表明,基于 ADAS 所发现的智能体性能大大优于最先进的手工设计的基线。例如,在 DROP 阅读理解任务中,ADAS 设计的智能体将 F1 分数提高了 13.6/100,在 MGSM 数学任务中将准确率提高了 14.4%。此外,在跨域迁移后,它们在 GSM8K 和 GSM-Hard 数学任务上的准确率分别比基线提高了 25.9% 和 13.2%。

**该研究成果表明,ADAS 在自动化智能体系统设计方面具有巨大潜力。** 此外,实验结果还表明,所发现的智能体不仅在跨相似领域迁移时表现良好,而且在跨不同领域迁移时也表现出色,例如从数学到阅读理解。

这项研究的突破性意义在于它为自动设计智能体系统开辟了新的道路,有望加速人工智能的发展,并推动其在更多领域发挥作用。

英语如下:

##AI Auto-Designs Agents, Boosting Math Scores by 25.9%!

**Keywords:** AI score improvement, ADAS advantage, foundation models

**News Content:**

## AI Auto-Designs Agents, Outperforming Manual Design by 25.9% in Math Scores

**Machine Intelligence Report**

In recent years, the rise of foundation models (FMs) like GPT and Claude has provided strong support for the construction of general-purpose agents, widelyapplied in various reasoning and planning tasks. However, agents required in the real world often go beyond single-model queries, instead being complex systems composed of multiple components. To handle intricate tasks, these agents also need access to external tools, suchas search engines, code execution, and database queries.

While existing research has proposed many effective building blocks for agent systems, such as chain-of-thought planning and reasoning, memory structures, tool use, and self-reflection, developingand combining these blocks often necessitates manual adjustments specific to the domain, consuming significant human effort and time.

To address this challenge, researchers from the University of British Columbia and Vector Institute, a non-profit AI research organization, have introduced a novel research area – Automated Design of Agentic Systems (ADAS). They developed asimple yet effective ADAS algorithm called “Meta Agent Search,” aiming to automatically create powerful and novel agent system designs through code programming.

**Key components of the ADAS algorithm include:**

* **Search space:** Defines the range of agent systems that can be characterized and discovered within ADAS.
* **Search algorithm:** Determines how the ADAS algorithm explores the search space, considering the trade-off between exploration and exploitation.
* **Evaluation function:** Defines how to assess candidate agents based on metrics like performance, cost, latency, or security.

**The core concept of Meta Agent Search is that a guiding meta-agent iteratively creates new agents, evaluates them, and adds them to the search space.**

Through experiments, researchers demonstrated that agents discovered using ADAS significantly outperform state-of-the-art manually designed baselines. For instance, in the DROP reading comprehension task, ADAS-designed agents improved the F1 score by13.6/100, while in the MGSM math task, they increased accuracy by 14.4%. Furthermore, after cross-domain transfer, they achieved accuracy improvements of 25.9% and 13.2% over the baseline on GSM8K and GSM-Hardmath tasks, respectively.

**This research indicates that ADAS holds immense potential in automating agent system design.** Moreover, experimental results suggest that the discovered agents not only perform well when transferred across similar domains but also excel in cross-domain transfer, such as from math to reading comprehension.

The groundbreaking nature of this researchlies in opening new avenues for automatically designing agent systems, promising to accelerate AI development and drive its application in a wider range of fields.

【来源】https://www.jiqizhixin.com/articles/2024-08-22-5

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注