AI Papers Reader

Personalized digests of latest AI research

View on GitHub

AI-Driven Exploration in the Space of Code: AIDE Automates Machine Learning Engineering

Machine learning (ML) development is notoriously time-consuming, often involving extensive trial-and-error. A new paper introduces AIDE (AI-Driven Exploration), an AI agent that automates this process, significantly accelerating the development of high-performing ML models. Developed by researchers at Weco AI, AIDE frames ML engineering as a code optimization problem, cleverly leveraging the power of large language models (LLMs) to efficiently search the space of possible code solutions.

Unlike previous AutoML approaches that rely on predefined search spaces of hyperparameters and architectures, AIDE directly searches within the space of code. This novel approach allows for greater flexibility and leverages the domain expertise embedded within LLMs, leading to more efficient exploration. AIDE uses a tree-search algorithm, where each node in the tree represents a candidate code solution and edges represent improvements. The algorithm systematically explores the tree, refining promising solutions and discarding less effective ones.

AIDE’s effectiveness stems from its three core components:

  1. Search Policy: AIDE uses a hard-coded rule to decide whether to draft new code, debug existing buggy code, or improve an already working solution. This strategy ensures balanced exploration and refinement.

  2. Coding Operator: This component uses an LLM to propose new solutions. It employs specialized prompts for different tasks: drafting entirely new code, debugging existing code, and making incremental improvements to existing, functional code. The prompts provide the LLM with essential context (e.g., past attempts, performance metrics, error messages).

  3. Summarization Operator: To avoid exceeding the LLM’s context window, this operator summarizes relevant information from the tree-search history, such as performance metrics and high-level descriptions of past improvements, providing a concise context for each new LLM query.

The authors evaluated AIDE’s performance across various benchmarks, including their own internally curated Weco-Kaggle benchmark, OpenAI’s MLE-Bench, and METR’s RE-Bench.

The Weco-Kaggle benchmark involved 63 Kaggle competitions. On a subset of 16 tabular machine learning competitions, AIDE outperformed the median human participant in 50% of the tasks and exceeded the performance of 51.38% of human participants on average. This outperformance was even more pronounced when compared to established AutoML tools like H2O AutoML and the recently popular AutoGPT agent, which achieved significantly lower success rates.

On MLE-Bench, a benchmark consisting of 75 real-world Kaggle competitions, AIDE demonstrated state-of-the-art results, achieving a substantially higher rate of obtaining medals compared to other LLM agents. This success is attributed to AIDE’s iterative refinement approach. The RE-Bench evaluation, focusing on more challenging AI R&D tasks, showed that AIDE can even outperform human experts within a limited timeframe, though human expertise eventually surpasses AIDE’s simpler greedy policy for more complex problems.

AIDE’s success underscores the significant potential of LLM-powered agents to automate complex and time-consuming tasks like ML engineering. By intelligently searching the space of code solutions, AIDE reduces the reliance on trial-and-error, dramatically speeding up the ML development process, paving the way for accelerated innovation in the field of AI. The public availability of AIDE’s code further contributes to this advancement, allowing the wider research community to explore and build upon this promising technology.