DOLPHIN: AI Takes the Helm in Scientific Research
A team of researchers from Fudan University and the Shanghai Artificial Intelligence Laboratory have unveiled DOLPHIN, a groundbreaking closed-loop, open-ended auto-research framework. This system, described in a recent preprint, marks a significant step towards fully automating the scientific research process. Unlike previous AI-assisted research tools, DOLPHIN doesn’t just augment human efforts; it independently generates research ideas, conducts experiments, analyzes results, and iteratively refines its approach based on the feedback loop.
The paper positions DOLPHIN within an evolutionary trajectory of scientific research. It begins with entirely human-driven research, progresses to AI-assisted research (e.g., using LLMs to improve efficiency), then semi-automatic research (automating specific tasks), and finally culminates in DOLPHIN’s fully automated approach.
DOLPHIN’s core functionality revolves around three key stages:
-
Idea Generation: DOLPHIN starts by retrieving relevant papers on a given topic using Semantic Scholar’s API. Instead of directly using these papers, DOLPHIN filters them based on topic and task relevance, ensuring only the most pertinent resources inform the idea generation process. This filtering step significantly improves idea quality compared to simply using all retrieved papers. Using a large language model (LLM), DOLPHIN then generates novel and independent research ideas, ensuring they haven’t been previously explored. This process includes a crucial novelty and independence check to avoid redundancy and ensure that truly novel ideas are pursued.
-
Experimental Verification: DOLPHIN translates its generated ideas into executable code, automatically debugging any errors encountered through an exception-traceback-guided process. This cleverly uses the error messages to guide the LLM in identifying the root cause within the code, leading to efficient debugging. The automated experiments are then conducted on benchmark datasets. The choice of datasets depends on the nature of the generated research idea, encompassing image classification (CIFAR-100), 3D point classification (ModelNet40), and sentiment analysis (SST-2).
-
Results Feedback: Finally, DOLPHIN analyzes the experimental results, categorizing them as improvements, maintenance, or decline relative to baseline performance. This information is fed back into the idea generation process, guiding the LLM towards generating higher-quality, more effective ideas in subsequent iterations. This iterative refinement process is key to DOLPHIN’s ability to progressively improve upon its initial ideas.
The researchers extensively evaluated DOLPHIN across diverse tasks. In 3D point classification, DOLPHIN generated methods that achieved performance comparable to state-of-the-art human-designed models, highlighting the system’s potential. In 2D image classification, it generated improvements on standard benchmarks, and similar success was observed in sentiment analysis. These results, across multiple domains, demonstrate the robustness and effectiveness of DOLPHIN’s automated workflow.
The study showcases how DOLPHIN overcomes limitations of previous auto-research systems. By incorporating a closed-loop feedback mechanism, it iteratively improves the quality of its research ideas and increases the success rate of code execution, addressing critical challenges identified by previous research in the field. While still in its early stages, DOLPHIN offers a compelling vision of the future of scientific research, where AI not only assists but takes the lead in discovering new knowledge. The authors acknowledge that challenges remain, particularly concerning the scaling up to more complex multidisciplinary research problems, but this work represents a significant leap forward in this direction.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.