AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Orchard: The New Open-Source Blueprint for Building AI That Actually "Does" Things

In the rapidly evolving landscape of artificial intelligence, the industry is moving past simple chatbots that talk and toward “agents”—AI systems that can actually execute tasks. Whether it is a software agent fixing a bug in a complex codebase or a digital assistant booking a flight through a web browser, these agents require more than just a smart “brain”; they need a safe, scalable “office” where they can work, fail, and learn from their mistakes.

Researchers from Microsoft Research have recently unveiled Orchard, an open-source framework designed to democratize the creation of these autonomous agents. While industry leaders like OpenAI and Google have long held the keys to high-performing agentic systems, Orchard provides the open-source community with the infrastructure and training recipes needed to close the gap.

The Problem: The Infrastructure Bottleneck

To build an intuition for the problem Orchard solves, imagine teaching a new employee how to manage a company’s server. You wouldn’t just give them a manual; you would give them a “sandbox”—a replica of the server where they can try commands without crashing the real business.

For AI agents, creating these sandboxes at scale is a nightmare. To train an agent to fix software, a developer must programmatically clone a repository, install dependencies, and run test suites thousands of times. Existing tools are either proprietary and expensive or so clunky that they slow down the AI’s learning process.

The Solution: Orchard Env

At the heart of the new framework is Orchard Env, a lightweight, Kubernetes-native service that manages these digital sandboxes. It is designed to be “harness-agnostic,” meaning it doesn’t matter what specific task the AI is doing; the environment provides a consistent way to execute commands and handle files.

Orchard Env is remarkably fast, with an average command-execution latency of just 0.28 seconds. By using “agent injection”—where the necessary tools are popped into a container only when needed—it avoids the cost of building massive, custom images for every task. The researchers found that Orchard can operate at less than half the cost of existing commercial alternatives.

Success Across Three Domains

The researchers demonstrated Orchard’s power through three specialized “recipes”:

  1. Orchard-SWE (Software Engineering): This agent was trained to resolve real-world GitHub issues. It achieved a 67.5% success rate on the SWE-bench Verified benchmark, a new state-of-the-art for open-source models of its size. Notably, the researchers used a technique called “credit-assignment SFT,” which teaches the AI to identify the “productive segments” of a task even if the overall attempt failed. It’s like a student getting partial credit for the correct steps in a math problem, allowing the model to learn much faster.
  2. Orchard-GUI (Web Navigation): This 4-billion parameter vision-language model acts as a “computer-use” agent. It can look at screenshots and navigate websites like Amazon or Coursera. Despite its small size, it outperformed many massive proprietary models, averaging a 68.4% success rate across several benchmarks.
  3. Orchard-Claw (Personal Assistant): This agent focuses on productivity workflows like managing emails and calendars, proving that skills learned in one digital “office” can transfer to others.

By releasing Orchard as an open-source project, the team hopes to accelerate a future where sophisticated, autonomous AI isn’t just a luxury for tech giants, but a tool accessible to every researcher and developer.