The Self-Improving Handyman: How Skill1 Gives AI Agents a Growing Library of Expertise

🔊

💬 Ask

For years, artificial intelligence researchers have chased the dream of a truly autonomous agent—an AI that doesn’t just follow instructions, but learns from its mistakes and builds a repertoire of expertise. While today’s large language models (LLMs) are brilliant, they often suffer from “goldfish memory” when it comes to specific procedures. They might solve a complex task once, but the next time they face a similar challenge, they effectively start from scratch.

A new framework called Skill1 aims to change that. Developed by a multi-institutional team including researchers from the University of Science and Technology of China and Meituan, Skill1 allows AI agents to autonomously build, use, and refine their own “skill libraries.”

The Problem of Isolated Learning

In current AI design, “skill-augmented” agents usually operate in three separate phases: they select a skill from a database, utilize it to perform a task, and distill their experience into a new skill for future use.

The bottleneck has always been that these three phases are typically trained in isolation. An agent might be great at following a recipe (utilization) but terrible at finding the right one in its book (selection). Because the training signals for these tasks often conflict, the agent’s overall “evolution” stalls.

Skill1 solves this by training a single “policy”—the agent’s brain—to handle all three stages simultaneously using a single source of truth: whether or not the task actually succeeded.

Building Intuition: The Microwave Dilemma

To understand how Skill1 works, consider a task from the “ALFWorld” simulation used in the paper: “Heat a plate and put it in the cabinet.”

In a standard setup, an agent might try to heat the plate on a stoveburner, which fails because ceramic plates don’t heat well on open burners. Without a skill library, the agent might repeat this mistake tomorrow.

With Skill1, the process evolves:

Selection: The agent generates a natural language query: “Tips for heating an object.” It searches its library and finds a past successful strategy.
Utilization: The retrieved skill says: “Stoveburners are non-functional for plates; use microwave 1 on countertop 2.” The agent follows this advice and succeeds.
Distillation: After the success, the agent “reflects” on its trajectory. It writes a condensed, reusable strategy: “When stoveburners are ineffective, the microwave is the most suitable option.”

One Signal, Three Lessons

The technical breakthrough of Skill1 lies in how it assigns “credit” for success. It takes the simple result of a task (Success = 1, Failure = 0) and decomposes it.

The agent looks at the low-frequency trend—how well a specific skill has performed over dozens of attempts. If a skill consistently leads to success, the agent learns to prioritize it during the “Selection” phase.

Simultaneously, it looks at high-frequency variation. If a new attempt is even more successful than the library’s current best average, it provides a “bonus” signal that rewards the “Distillation” phase. This creates a “rising tide” effect: as the agent distills better skills, the library gets stronger, which in turn makes the agent better at selecting the right tool for the job.

Results and the Path Forward

The results are striking. In household simulations, Skill1 achieved a 97.5% success rate, significantly outperforming prior models that trained selection and distillation separately. In “WebShop,” a simulator where agents must navigate e-commerce sites to find specific products, Skill1 proved far more adept at browsing and purchasing than its predecessors.

By treating skill management as a unified lifecycle rather than a series of chores, Skill1 brings us a step closer to AI that doesn’t just work, but actually gets wiser with every task it completes.

AI Papers Reader

Personalized digests of latest AI research

The Self-Improving Handyman: How Skill1 Gives AI Agents a Growing Library of Expertise

The Problem of Isolated Learning

Building Intuition: The Microwave Dilemma

One Signal, Three Lessons

Results and the Path Forward

Chat about this paper