ReasoningBank 是一种新型智能体记忆框架,利用成功与失败的经验提炼出可泛化的推理策略,使智能体在部署后能够持续从经验中学习。智能体在应对复杂现实世界任务(如通用网页导航及辅助大规模软件工程代码库)中正变得日益关键。
ReasoningBank is a novel agent memory framework that uses successful and failed experiences to distill generalizable reasoning strategies, enabling an agent to continuously learn from experience after deployment. Agents are becoming increasingly crucial in tackling complex real-world tasks, ranging from general web navigation to assisting with extensive software engineering codebases.
ReasoningBank 是一种新颖的智能体记忆框架,它利用成功与失败的经验提炼出可泛化的推理策略,使智能体能够在部署后持续从经验中学习。智能体在应对各类复杂现实任务中正变得日益关键,其应用范围涵盖通用网页导航,到协助处理大规模软件工程代码库。然而,当这些智能体逐步转向现实世界中长期、持续运行的角色时,却面临一项关键局限:它们难以在部署后对成功与失败的经验进行分析与学习。测试时扩展(Test-time Scaling, TTS)——即在推理阶段扩展计算资源——已在数学与竞赛编程等推理领域展现出极强的有效性。但在智能体环境中,现有 TTS 方法往往忽略探索过程轨迹,仅将最终答案视为唯一有用的结果。而这一被忽视的探索过程,实则蕴含丰富信息,有望显著加速智能体随时间推移从经验中学习的能力。ReasoningBank 提供了一个强大框架,赋能大语言模型(LLM)从经验中学习,并在测试阶段演进为持续学习者。我们认为,以记忆驱动的经验扩展代表了智能体扩展方向上一个至关重要的新前沿。
ReasoningBank is a novel agent memory framework that uses successful and failed experiences to distill generalizable reasoning strategies, enabling an agent to continuously learn from experience after deployment. Agents are becoming increasingly crucial in tackling complex real-world tasks, ranging from general web navigation to assisting with extensive software engineering codebases. However, as these agents transition into persistent, long-running roles in the real world, they face a critical limitation: they struggle to analyze and learn from successful and failed experiences after deployment. Test-time scaling (TTS) — scaling compute at inference time — has shown immense effectiveness in reasoning domains like math and competitive programming. However, in agentic environments, existing TTS methods often discard the exploration trajectory and treat the final answer as the only useful outcome. This overlooked exploration is actually a rich data source that could accelerate an agent's ability to learn from experience over time. ReasoningBank provides a powerful framework for enabling LLMs to learn from experiences and evolve into continuous learners during test-time. We believe memory-driven experience scaling represents a crucial new frontier for agent scaling.