AI Papers Reader

Personalized digests of latest AI research

View on GitHub

"Thinking" Isn't Everything: New AI Method Scales Agent Performance by Focusing on Actions

Pittsburgh, PA - For years, AI researchers have focused on building “thinking” into their language models, believing that more complex reasoning is the key to unlocking greater capabilities. However, a new paper suggests that there is another crucial element that has been largely overlooked: interaction. The study, titled “Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction,” introduces a novel approach called Test-Time Interaction (TTI), which scales the number of actions an agent can take within an environment, resulting in significant improvements in performance.

The researchers, from Carnegie Mellon University, University of Illinois Urbana-Champaign, and other institutions, argue that the current paradigm, which emphasizes long reasoning traces before action, is insufficient for tasks that require constant adaptation to the environment. In these cases, agents benefit more from active exploration and backtracking than from simply “thinking” harder.

Consider the example of booking a hotel online. An agent tasked with finding the cheapest hotel might initially identify a seemingly good option. However, if it only “thinks” about this option without taking further actions, it might miss out on a better deal that appears later in the search results. TTI allows the agent to browse more listings, compare reviews, and check availability before making a final decision, resulting in a better outcome.

“This is a significant departure from traditional methods,” explains lead author Junhong Shen. “Instead of just telling the AI to ‘think harder,’ we’re enabling it to ‘act smarter’ by giving it the freedom to explore and adapt.”

The study demonstrates the effectiveness of TTI in web-based environments. Researchers used a Gemma 3 12B model and TTI, producing state-of-the-art open-source, open-data web agents on WebVoyager and WebArena benchmarks. One of the key innovations of TTI is its curriculum-based online reinforcement learning (RL) approach. It trains agents by adaptively adjusting the rollout lengths, starting with simpler tasks and gradually increasing the interaction horizon. This approach allows agents to first learn effective exploitation strategies before extending their horizon to explore.

Experiments showed that even simply prompting an agent to “check again” after deciding to complete a task can improve its success rate. For example, a basic system’s task success increased from 23% to over 28% on the WebArena benchmark by applying the check-again approach.

“Our results show that interaction scaling is a powerful, complementary axis to scaling per-step compute,” says Aviral Kumar, another author of the study. “It offers new avenues for training adaptive agents that can dynamically balance exploration and exploitation.”

While the study focuses on web agents, the researchers believe that TTI has broader implications for other domains, such as robotics and open-world compute use problems. By enabling agents to learn and adapt through interaction, TTI opens up new possibilities for building more intelligent and capable AI systems.