AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Scaling Up Agentic Intelligence: New Framework Paves Way for Smarter AI Assistants

New York, NY – September 17, 2025 – Researchers from Tongyi Lab at Alibaba Group have unveiled a novel framework designed to significantly enhance the capabilities of Large Language Models (LLMs) in real-world applications, particularly in their ability to interact with and utilize various external tools and services. Dubbed “AgentScaler,” this new approach focuses on “environment scaling” and a sophisticated two-phase training strategy to foster more robust and generalizable agentic intelligence.

The core challenge, as outlined in their paper, is equipping LLMs with precise and reliable “function-calling” intelligence. This is crucial for agents that need to interact with the complexities of the real world, which often involves calling diverse APIs. The effectiveness of these agents is directly tied to the variety of environments they are trained in. However, creating these diverse training environments has historically been a bottleneck, requiring extensive manual effort and often resulting in less realistic interactions.

AgentScaler tackles this head-on by introducing a scalable framework that automatically constructs heterogeneous, fully simulated environments. This process systematically broadens the spectrum of potential function-calling scenarios. For instance, imagine an AI assistant needing to book a flight. AgentScaler can generate a simulated environment where the agent must interact with a series of “tools” representing flight booking APIs. These tools might include functions to search for flights based on dates and destinations, check seat availability, and finally, make a reservation. The framework ensures that the agent learns to chain these calls correctly and handle the data passed between them, much like a human would navigate a complex booking website.

The paper highlights two key challenges: how to scale environments in a principled way and how to effectively train agentic capabilities from these simulated experiences. AgentScaler addresses the first by treating each function call as a read-write operation on an underlying “environmental database.” By organizing these tools into domains with specific database structures, the framework can programmatically generate realistic environments. For the second challenge, AgentScaler employs a two-stage agent fine-tuning strategy.

The first stage focuses on imparting fundamental agentic capabilities, teaching the AI to master basic tool usage and understand how to integrate tool outputs into coherent responses. This is akin to teaching a new employee the fundamental operations of a company’s software suite. The second stage then specializes these agents for domain-specific contexts, allowing them to refine their abilities within particular industries or task types. This is like training that new employee on the specific protocols and nuances of the marketing department.

Experimental results on established benchmarks like t-bench, t²-Bench, and ACEBench-en demonstrate AgentScaler’s significant improvements in function-calling capabilities. The paper reports that AgentScaler models achieve state-of-the-art performance among open-source models with less than 1 trillion parameters, often matching or even surpassing much larger or closed-source systems. This efficiency is a key advantage, making AgentScaler particularly promising for deployment in resource-constrained or latency-sensitive scenarios.

“Our work highlights the importance of scalable environment construction and verifiable agentic experience for fostering robust and generalizable language agents,” the authors conclude. Future directions include integrating reinforcement learning and extending the framework to broader modalities and real-world deployment. The AgentScaler framework represents a significant step towards building more capable and versatile AI agents that can effectively interact with and assist us in our daily lives.