Robots Trained Once, Deployed Anywhere: Robot Utility Models Enable Zero-Shot Transfer to New Environments
📄 Full Paper
💬 Ask
Robot manipulation, particularly using large amounts of training data, has shown incredible progress in recent years, allowing robots to successfully handle a wide range of tasks in real-world environments. However, these robots still require significant finetuning for each new environment, unlike language models that can be used zero-shot in new, unseen contexts.
A new paper by researchers at New York University, Hello Robot Inc., and Meta Inc., proposes Robot Utility Models (RUMs), a framework for training robots once and deploying them zero-shot in new environments. RUMs are trained on a diverse set of environments and objects and can then perform tasks in novel settings with unseen objects without any further training.
The paper introduces three key innovations:
- A new data collection tool: This tool, called Stick-v2, is a handheld device built from an iPhone Pro and a few low-cost materials. It’s designed to be portable, robust, and easy to use, allowing researchers to quickly collect high-quality demonstration data in diverse environments.
- Multi-modal behavior learning: RUMs are trained using multi-modal imitation learning, leveraging data from various sources like cameras, sensors, and robot state. This allows for a more robust understanding of tasks and their successful execution in unseen environments.
- A self-critique system: The paper introduces a system that uses a multimodal language model (mLLM) to evaluate the robot’s performance in real-time. This system helps the robot detect errors and automatically retry the task with a new initial robot state.
Concrete Examples:
The researchers trained five RUMs for specific tasks, such as opening drawers, picking up bags, and reorienting fallen objects.
- To train the door-opening RUM, they collected data from various environments, including cabinets, microwaves, and refrigerators, using Stick-v2. The robot was trained to understand different door handle types and apply the correct approach for each.
- The bag-pickup RUM was trained on a diverse set of bags, from paper bags to grocery bags, in various environments. It learned to identify the optimal grasping points and lifting strategy for each type of bag.
Results:
The RUMs, on average, achieved 90% success rate in unseen, novel environments with unseen objects. They were also successful in different robot and camera setups without any further data, training, or fine-tuning.
Key Findings:
- Data diversity is crucial for successful zero-shot deployment. The more varied the training environments and objects, the better the RUM’s ability to generalize to new settings.
- The choice of training algorithm and policy class matters less than the quality and diversity of training data.
- While the quality of training data matters, a combination of expert and non-expert data can often lead to better performance.
- Using an mLLM for self-critique and retrying can significantly improve the performance of RUMs.
The paper highlights the importance of focusing on data collection and developing robust training systems for creating general-purpose, zero-shot deployable robots. This opens up new opportunities for robots to operate in dynamic, unpredictable environments without the need for extensive human intervention.
The authors have open-sourced their code, data, and models to encourage further research and development of RUMs for a wider variety of tasks. This work represents a significant step towards creating robots that can truly adapt to new situations and perform tasks in real-world settings with minimal human effort.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.