AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Agent Lightning: A New Framework for Training AI Agents with Reinforcement Learning

Microsoft researchers have unveiled “Agent Lightning,” a novel framework designed to train AI agents using reinforcement learning (RL) without requiring significant modifications to existing agent codebases. This breakthrough aims to bridge the gap between the rapidly evolving field of AI agents and the powerful optimization capabilities of RL.

Traditionally, integrating RL with complex AI agents, which often involve multiple Large Language Model (LLM) calls and interactions with external tools, has been a significant challenge. Existing methods often require tightly coupling the RL training process with the agent’s execution logic or relying on complex sequence concatenation with masking. Agent Lightning tackles these issues by completely decoupling agent execution from RL training.

The core of Agent Lightning’s innovation lies in its Training-Agent Disaggregation architecture. This design separates the computationally intensive RL training component (the “trainer”) from the agent’s execution logic (the “rollout”). This means that the RL framework can focus on optimizing the LLM’s weights, while the agent operates independently, allowing for greater flexibility in how agents are developed.

To achieve this decoupling, Agent Lightning formulates agent execution as a Markov Decision Process (MDP). This allows for a unified data interface that captures agent trajectories as sequences of states, actions, and rewards, regardless of the underlying agent framework. For instance, consider an agent designed to answer questions by retrieving information from a database. Agent Lightning can capture the entire process: the LLM generating a search query, the database interaction to fetch relevant passages, and the LLM generating the final answer. Each of these steps becomes a transition in the MDP, providing structured data for RL training.

The framework also introduces LightningRL, a hierarchical RL algorithm. This algorithm first assigns credit for the overall task outcome to individual transitions within an agent’s execution. It then utilizes existing single-turn RL methods to further refine the LLM’s performance. This approach is exemplified in a text-to-SQL task where an agent needs to generate a SQL query, execute it, and then present the result. Agent Lightning can break down this multi-step process into discrete transitions, enabling the RL algorithm to learn from each stage, such as improving the accuracy of the SQL query generation.

Agent Lightning’s flexibility is further demonstrated by its ability to integrate with various agent development frameworks, including LangChain, OpenAI Agents SDK, and AutoGen, with minimal to zero code modifications. The system comprises a Lightning Server that manages the RL training process and a Lightning Client that runs the agent and collects data. This client-side approach allows for the use of existing observability frameworks like OpenTelemetry to capture rich execution traces, further enhancing the training data.

Experiments showcased in the paper highlight Agent Lightning’s effectiveness across diverse tasks, including text-to-SQL generation, retrieval-augmented generation (RAG), and math question answering with tool usage. In all cases, Agent Lightning demonstrated stable and continuous performance improvements, suggesting its strong potential for real-world agent training and deployment. The framework’s ability to adapt to different agent architectures and optimize performance without intrusive code changes makes it a significant advancement for developing more capable and robust AI agents.