Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

上海枫泾古镇一角_20240824上海枫泾古镇一角_20240824
0

副标题:OpenAI的o1引发LLM推理能力与思维链技术的热潮,但新论文揭示其局限性

随着OpenAI的o1横空出世,大型语言模型(LLM)的推理能力和思维链(CoT)技术引起了广泛的关注。人们普遍认为,思维链将很快成为所有LLM的标配。然而,OpenAI自身也承认,o1在某些任务上的表现并不比GPT-4o出色,尤其是语言中心的任务。近期,一篇由德克萨斯大学奥斯汀分校、约翰·霍普金斯大学和普林斯顿大学的研究者们共同发表的论文,对思维链技术的有效性提出了质疑,引发了学术界的热议。

论文标题:To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
论文地址:https://arxiv.org/pdf/2409.12183
GitHub库:https://github.com/Zayne-sprague/To-CoT-or-not-to-CoT (待更新)

论文研究了思维链技术在帮助LLM解决各种问题上的有效性。研究团队分析了近期的相关文献,比较了CoT与直接回答(DA)方法的性能表现。他们使用了20个数据集和14个主流LLM,在零样本提示和少样本提示设置下进行了实验。研究结果表明,CoT在解决涉及数学和符号推理的任务上表现优异,但在其他任务上效果并不显著,甚至可能降低模型性能。

此外,CoT在执行计算和符号操作方面优于直接回答法,但在与能使用外部工具的LLM相比时,效果稍逊一筹。这意味着在CoT有用的问题上,使用外部工具能取得更好的结果;而在其他问题上,CoT的能力有限。

研究团队认为,很多广泛使用CoT解决的问题其实并不需要CoT,现在已有更高效的方法,能以更低的推理成本取得相近的性能。他们强调,基于提示词的CoT已不够用,需要更复杂精妙的方法,如基于搜索、交互式智能体或针对CoT进行过更好微调的模型。

简而言之,思维链技术在数学和符号推理任务上表现优异,但在其他领域效果有限。这引发了学术界对CoT技术有效性的深入讨论,也为未来LLM的研究和应用提供了新的视角。


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注