AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Agents Learn to Master Software Autonomously Through "Self-Evolving" System

New research introduces SEAgent, a system that allows computer agents to learn and master unfamiliar software entirely through experience, bypassing the need for human-labeled data.

In a significant leap forward for artificial intelligence, researchers have developed SEAgent, a novel framework that enables computer use agents (CUAs) to autonomously learn and evolve their skills within new software environments. This breakthrough, detailed in a recent paper, addresses a key limitation in current AI agents: their heavy reliance on extensive, human-annotated datasets. SEAgent’s approach allows agents to learn through a process of “experiential learning,” effectively teaching themselves how to operate complex software through trial and error.

Imagine a new employee starting at a company, tasked with learning a piece of specialized software they’ve never encountered before. Instead of a manual or extensive training sessions, this new employee is simply given the software and a series of progressively challenging tasks. They might try a few things, make mistakes, observe the results, and gradually refine their understanding and actions. This is the essence of SEAgent.

The system works by first generating a “software guidebook” of the environment. Then, a “Curriculum Generator” creates a series of tasks, starting with simple ones and increasing in complexity. An “Actor Model” attempts these tasks, and a “World State Model” meticulously analyzes each step, providing feedback on whether the action was successful or led to a failure. This feedback is crucial: successful actions are reinforced, while failed actions are analyzed to understand why they went wrong. This continuous loop of task execution, evaluation, and learning allows the agent to build expertise.

A key innovation is the system’s ability to learn from mistakes. For instance, if an agent tries to click a button that doesn’t exist, SEAgent doesn’t just mark it as a failure. It actively learns to avoid that specific incorrect action in the future by using “adversarial imitation.” Conversely, successful actions are rewarded through a method called “Group Relative Policy Optimization” (GRPO).

Furthermore, SEAgent employs a unique “specialist-to-generalist” training strategy. Initially, individual agents are trained to become experts in specific software applications. These specialized skills are then distilled and integrated into a single, more capable generalist agent. This approach has proven more effective than trying to train a generalist agent from scratch, leading to a notable 23.2% improvement in success rates compared to existing open-source agents, pushing success rates from 11.3% to 34.5% on novel software environments.

The researchers demonstrated SEAgent’s capabilities across five different software applications, showcasing its adaptability and learning prowess. This development holds significant promise for creating more versatile and intelligent AI assistants capable of navigating the ever-expanding digital landscape without constant human oversight, potentially revolutionizing how we interact with software and automate complex tasks.