AI Papers Reader

Personalized digests of latest AI research

View on GitHub

New AI Approach Enhances Autonomous Research Agents

San Francisco, CA - Researchers at Salesforce AI Research have unveiled a novel reinforcement learning (RL) framework, dubbed SFR-DeepResearch (SFR-DR), designed to empower single large language models (LLMs) to autonomously conduct complex research tasks. This new method focuses on refining the reasoning and tool-use capabilities of existing LLMs, enabling them to perform intricate, long-horizon tasks without explicit step-by-step human guidance.

Traditional “Deep Research” (DR) agents, often operating in multi-agent systems, rely on predefined workflows where different agents handle specific roles like planning or information retrieval. In contrast, SFR-DR utilizes a single, autonomous agent that dynamically determines its next action based on the evolving context. This approach offers greater flexibility and can generalize better to unseen tasks. Furthermore, by integrating single agents into larger multi-agent systems, the complexity of the overall system can be reduced.

The core innovation of SFR-DR lies in its “agentic inference pipeline” and a unique “RL training recipe.” The inference pipeline equips LLMs with a memory management system that allows them to handle virtually unlimited context, mirroring how humans manage information over extended periods. For instance, when an agent has gathered too much information, it can utilize a “clean_memory” tool to discard less important details, preventing its operational memory from becoming overwhelmed.

The RL training recipe is particularly noteworthy for its use of entirely synthetic data to train the agents. This approach allows for the creation of complex, reasoning-intensive datasets that are more challenging than existing ones. The researchers developed a modified REINFORCE algorithm to stabilize the training process, which can be prone to instability due to the diverse lengths of agent interactions. Techniques like “temporal advantage normalization” and “strategic trajectory filtering” are employed to prevent degenerate behaviors, such as repetitive tool usage.

To demonstrate the efficacy of SFR-DR, the researchers applied it to several open-source LLMs. Their best performing variant, SFR-DR-20B, achieved an impressive 28.7% accuracy on the Humanity’s Last Exam (HLE) benchmark, a significant improvement over its base model. This success highlights the ability of SFR-DR to enhance both the tool-use and complex reasoning abilities of LLMs.

The system utilizes a minimal set of tools, including a basic internet search function, a webpage scraping tool, and a Python code interpreter. These tools are deliberately designed to provide sufficient functionality without making the tasks trivially easy, thereby incentivizing the agent to explore and utilize its capabilities more effectively. For example, the search_internet tool returns only the top 10 search results, forcing the agent to be strategic in its queries. The browse_page tool provides readable content but removes hyperlinks, making discovery of new information dependent on further searches rather than direct navigation. The code_interpreter executes Python code locally but is stateless, meaning it doesn’t retain information between executions, a deliberate design choice to focus the agent’s reasoning.

The research also delved into the “effectiveness of modified agentic workflow,” demonstrating that re-framing multi-turn conversations into a single-turn context significantly boosts performance. This is attributed to the fact that many LLMs are heavily trained on single-turn reasoning tasks, and this approach aligns the agent’s inference process with its existing strengths.

Overall, SFR-DeepResearch represents a significant step towards building more capable and autonomous AI agents for complex information-gathering and reasoning tasks, paving the way for more sophisticated AI-driven research and problem-solving.