AgentStore: A Scalable Platform for Integrating Heterogeneous Agents as Specialized Generalist Computer Assistants
📄 Full Paper
💬 Ask
The rise of digital agents capable of automating complex computer tasks has attracted considerable attention due to their potential to enhance human-computer interaction. However, existing agent methods struggle with generalization and specialization capabilities, especially in handling open-ended computer tasks in real-world environments. Imagine needing to perform a series of tasks on your computer. For instance, you need to create a new spreadsheet, add a column to calculate profit margins, and then highlight all rows where total sales exceed $1,000. This scenario requires the ability to perform specific, specialized tasks within a spreadsheet. However, you might also need to find information online, access a specific file on your system, or even perform basic web browser operations. Handling these diverse tasks would demand the ability to generalize across different tasks and systems. Existing methods often fall short in these scenarios.
To address this challenge, researchers at Xi’an Jiaotong University and Shanghai AI Lab have developed AgentStore, a scalable platform designed to dynamically integrate heterogeneous agents for automating computer tasks. Inspired by the App Store, AgentStore empowers users to integrate third-party agents, allowing the system to continuously enrich its capabilities and adapt to rapidly evolving operating systems. This dynamic integration enables AgentStore to go beyond the limitations of existing methods, which rely on a fixed set of agents with limited capabilities.
The core of AgentStore is a novel MetaAgent equipped with the AgentToken strategy. MetaAgent acts as a manager, coordinating and utilizing diverse agents to handle both domain-specific and system-wide tasks. The AgentToken strategy enables efficient and effective management of a large and growing number of agents. Each agent is represented as a learnable token embedding in MetaAgent’s architecture, similar to a word token embedding in a language model. During inference, MetaAgent activates the corresponding agent tokens to execute the task when an agent token is predicted.
Extensive experiments on three challenging benchmarks demonstrate that AgentStore surpasses the limitations of previous systems with narrow capabilities. For example, on the highly challenging OSWorld benchmark, AgentStore achieved a significant improvement, more than doubling the previous results. AgentStore’s ability to enhance agent systems in both generalization and specialization underscores its potential for developing the specialized generalist computer assistant. The research team emphasizes the need for developing a platform that can adapt to the evolving needs of users and offer a more comprehensive approach to automation.
The researchers believe that AgentStore, as an open platform, will integrate more powerful agents as basic AGI models continue to evolve, progressively advancing toward the vision of building the specialized generalist computer assistant.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.