AI Papers Reader

Personalized digests of latest AI research

View on GitHub

AndroidLab: A Framework for Training and Evaluating Android Agents

The development of autonomous agents that can interact with real-world environments, such as mobile operating systems, is a significant research challenge. This paper introduces ANDROIDLAB, a systematic framework for training and evaluating Android agents. It addresses the limitations of existing benchmarks, which often rely on static environments or closed-source models, and offers a more comprehensive evaluation approach.

The core of ANDROIDLAB is a standard operational environment that mimics the interactions users have with Android devices. This environment features:

To address the lack of open-source training data for Android agents, the paper presents Android Instruct, a dataset with 94.3k operation records. This dataset supports both text-only and multimodal training and is used to fine-tune six open-source models, resulting in significant performance improvements.

The evaluation metrics used in ANDROIDLAB are carefully designed to assess various aspects of agent performance:

The results demonstrate that fine-tuning open-source models with Android Instruct significantly improves their performance across different metrics, with some models even surpassing closed-source models in certain aspects. The paper highlights the potential of open-source models for developing effective Android agents, particularly when combined with a systematic framework like ANDROIDLAB.

The authors also examine the impact of different agent frameworks, such as ReAct and SeeAct, on model performance. Their findings suggest that ReAct is particularly effective in XML mode, while SeeAct does not consistently improve performance.

Overall, ANDROIDLAB is a valuable contribution to the field of mobile agent research. It provides a standardized environment and benchmark for training and evaluating Android agents, addressing the limitations of existing solutions. The open-source nature of the framework and dataset allows for wider adoption and collaboration within the research community.