Robots Trained Once, Deployed Anywhere: Robot Utility Models Enable Zero-Shot Transfer to New Environments

🔊

💬 Ask

Robot manipulation, particularly using large amounts of training data, has shown incredible progress in recent years, allowing robots to successfully handle a wide range of tasks in real-world environments. However, these robots still require significant finetuning for each new environment, unlike language models that can be used zero-shot in new, unseen contexts.

A new paper by researchers at New York University, Hello Robot Inc., and Meta Inc., proposes Robot Utility Models (RUMs), a framework for training robots once and deploying them zero-shot in new environments. RUMs are trained on a diverse set of environments and objects and can then perform tasks in novel settings with unseen objects without any further training.

The paper introduces three key innovations:

A new data collection tool: This tool, called Stick-v2, is a handheld device built from an iPhone Pro and a few low-cost materials. It’s designed to be portable, robust, and easy to use, allowing researchers to quickly collect high-quality demonstration data in diverse environments.
Multi-modal behavior learning: RUMs are trained using multi-modal imitation learning, leveraging data from various sources like cameras, sensors, and robot state. This allows for a more robust understanding of tasks and their successful execution in unseen environments.
A self-critique system: The paper introduces a system that uses a multimodal language model (mLLM) to evaluate the robot’s performance in real-time. This system helps the robot detect errors and automatically retry the task with a new initial robot state.

Concrete Examples:

The researchers trained five RUMs for specific tasks, such as opening drawers, picking up bags, and reorienting fallen objects.

To train the door-opening RUM, they collected data from various environments, including cabinets, microwaves, and refrigerators, using Stick-v2. The robot was trained to understand different door handle types and apply the correct approach for each.
The bag-pickup RUM was trained on a diverse set of bags, from paper bags to grocery bags, in various environments. It learned to identify the optimal grasping points and lifting strategy for each type of bag.

Results:

The RUMs, on average, achieved 90% success rate in unseen, novel environments with unseen objects. They were also successful in different robot and camera setups without any further data, training, or fine-tuning.

Key Findings:

Data diversity is crucial for successful zero-shot deployment. The more varied the training environments and objects, the better the RUM’s ability to generalize to new settings.
The choice of training algorithm and policy class matters less than the quality and diversity of training data.
While the quality of training data matters, a combination of expert and non-expert data can often lead to better performance.
Using an mLLM for self-critique and retrying can significantly improve the performance of RUMs.

The paper highlights the importance of focusing on data collection and developing robust training systems for creating general-purpose, zero-shot deployable robots. This opens up new opportunities for robots to operate in dynamic, unpredictable environments without the need for extensive human intervention.

The authors have open-sourced their code, data, and models to encourage further research and development of RUMs for a wider variety of tasks. This work represents a significant step towards creating robots that can truly adapt to new situations and perform tasks in real-world settings with minimal human effort.

AI Papers Reader

Personalized digests of latest AI research

Robots Trained Once, Deployed Anywhere: Robot Utility Models Enable Zero-Shot Transfer to New Environments

Chat about this paper