New AI System Automates Software Environment Setup
SAN FRANCISCO, CA – Setting up the right software environment for a project can be a tedious and time-consuming task for developers. Researchers from JetBrains have developed a novel system called PIPER that uses on-device reinforcement learning to automate this process, achieving results competitive with much larger and more expensive AI models.
The core challenge PIPER addresses is the automatic configuration of software projects. This process, known as environment setup, involves ensuring all necessary dependencies and settings are in place for code to run correctly. While large language models (LLMs) have shown promise in various software engineering tasks, automating environment setup has proven difficult, even for state-of-the-art models.
PIPER tackles this by combining two key AI techniques: supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR).
Supervised Fine-Tuning: Learning from Examples
First, PIPER undergoes supervised fine-tuning. In this stage, a smaller LLM, referred to as the “student” model, learns to mimic the behavior of a larger, more capable “teacher” LLM. This is achieved by training the student on a dataset of successful environment setup scripts generated by the teacher model. For instance, if a teacher model generates a script to set up a Python project by first installing pyenv
, then setting the Python version to 3.7.4
, and finally installing dependencies from requirements.txt
, the student model learns to produce similar, correct scripts. This stage helps the model understand the basic structure and syntax of environment setup scripts.
Reinforcement Learning: Refining with Rewards
Following SFT, PIPER employs reinforcement learning with verifiable rewards (RLVR). In this phase, the model generates environment setup scripts, and a specialized reward function, called “LLM-as-a-Judge,” evaluates their correctness. This judge function is designed to mimic the actual verification process that would occur in a real development environment. For example, if a script fails to install a crucial dependency like numpy
(which is required by the project’s setup.py
file), the LLM-as-a-Judge would identify this as an error. The reward function assigns a score based on whether the script successfully installs dependencies and resolves any potential issues flagged by static analysis tools like Pyright.
A concrete example of this could be a project that requires a specific version of a library, say pandas==1.5.0
. If PIPER generates a script that installs pandas
without specifying the version or installs an incompatible version, the LLM-as-a-Judge would detect this error and provide a negative reward. Conversely, a script that correctly identifies and installs pandas==1.5.0
would receive a positive reward, guiding PIPER to improve its script generation over time.
Performance and Efficiency
The researchers evaluated PIPER, specifically a Qwen3-8B model fine-tuned with this approach, on benchmarks like EnvBench-Python and Repo2Run. The results show that PIPER performs comparably to significantly larger and more resource-intensive models such as GPT-40 and Qwen3-32B, even outperforming them in some scenarios. Crucially, PIPER achieves this on-device, meaning it can run on consumer-grade hardware, democratizing access to sophisticated environment setup automation.
The study also highlights PIPER’s generalization capabilities, demonstrating its effectiveness not only on environment setup tasks but also on broader command-line interaction challenges. This suggests that the training process enhances the model’s fundamental ability to understand and execute instructions within a terminal environment.
The code and trained models for PIPER are publicly available, aiming to foster further research and development in this area. This advancement could significantly streamline software development workflows and improve the scalability of AI-driven software engineering benchmarks.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.