SkillWeaver: AI Learns to Navigate the Web Like Humans Through Reusable 'Skills'
Imagine teaching an AI to use the web like a person who’s never seen the internet before. They might struggle initially, but with experience, they’d learn common routines and build up a repertoire of skills. Researchers at The Ohio State University, University of Virginia, Purdue University, Carnegie Mellon University, and Cisco Research have developed “SkillWeaver,” a new framework that allows web agents to do just that – autonomously learn and improve their web-browsing abilities by synthesizing reusable skills, similar to how humans learn.
The core idea behind SkillWeaver is to enable web agents to learn from their experiences by abstracting common actions into reusable skills represented as APIs (Application Programming Interfaces). These skills can then be used across different websites and even shared between agents, significantly boosting their performance.
Here’s how it works:
-
Skill Proposal: The agent explores a new website and, using a Large Language Model (LLM), proposes potential skills or tasks it could learn. For instance, on a website like Drugs.com, a proposed skill might be “Identify pill using pill identifier”.
-
Skill Synthesis: The agent then attempts to execute this proposed skill, generating trajectories of actions. It uses a reward model to determine if it successfully completed the task. If successful, the trajectory is then converted into a reusable Python function (an API) that encapsulates the learned behavior. For the “identify pill” skill, this API would contain code to navigate to the pill identifier page, fill in the pill’s imprint and color, and submit the information.
-
Skill Honing: To ensure the API is robust, SkillWeaver automatically generates test cases and debugs the API based on feedback from the environment. For example, it might generate a test case to identify a white pill with the imprint “M366.” If the API fails, the system attempts to diagnose the error and refine the code.
The researchers tested SkillWeaver on both WebArena, a simulated web environment, and real-world websites. On WebArena, agents equipped with SkillWeaver achieved success rate improvements of up to 31.8%. Even more impressive were the results with “weaker” agents, which saw improvements as high as 54.3% by leveraging APIs created by stronger agents. This showcases the ability to transfer knowledge and skills among AI agents.
Experiments on real-world websites demonstrated SkillWeaver’s effectiveness in complex, dynamic environments, achieving a 39.8% improvement in success rate on tasks from the Online-Mind2Web benchmark. For example, the system learned how to automate tasks on websites like Drugs.com for searching for drug information, and Cookpad for finding recipes.
SkillWeaver represents a significant step towards creating more capable and adaptable web agents. By enabling agents to learn and reuse skills, it addresses the challenges of the web’s inherent complexity and diversity. This could lead to more efficient and helpful AI assistants that can navigate the web on our behalf, automating routine tasks and enhancing our productivity.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.