AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Robots Trained Once, Deployed Anywhere: Robot Utility Models Enable Zero-Shot Transfer to New Environments

Robot manipulation, particularly using large amounts of training data, has shown incredible progress in recent years, allowing robots to successfully handle a wide range of tasks in real-world environments. However, these robots still require significant finetuning for each new environment, unlike language models that can be used zero-shot in new, unseen contexts.

A new paper by researchers at New York University, Hello Robot Inc., and Meta Inc., proposes Robot Utility Models (RUMs), a framework for training robots once and deploying them zero-shot in new environments. RUMs are trained on a diverse set of environments and objects and can then perform tasks in novel settings with unseen objects without any further training.

The paper introduces three key innovations:

  • A new data collection tool: This tool, called Stick-v2, is a handheld device built from an iPhone Pro and a few low-cost materials. It’s designed to be portable, robust, and easy to use, allowing researchers to quickly collect high-quality demonstration data in diverse environments.
  • Multi-modal behavior learning: RUMs are trained using multi-modal imitation learning, leveraging data from various sources like cameras, sensors, and robot state. This allows for a more robust understanding of tasks and their successful execution in unseen environments.
  • A self-critique system: The paper introduces a system that uses a multimodal language model (mLLM) to evaluate the robot’s performance in real-time. This system helps the robot detect errors and automatically retry the task with a new initial robot state.

Concrete Examples:

The researchers trained five RUMs for specific tasks, such as opening drawers, picking up bags, and reorienting fallen objects.

  • To train the door-opening RUM, they collected data from various environments, including cabinets, microwaves, and refrigerators, using Stick-v2. The robot was trained to understand different door handle types and apply the correct approach for each.
  • The bag-pickup RUM was trained on a diverse set of bags, from paper bags to grocery bags, in various environments. It learned to identify the optimal grasping points and lifting strategy for each type of bag.

Results:

The RUMs, on average, achieved 90% success rate in unseen, novel environments with unseen objects. They were also successful in different robot and camera setups without any further data, training, or fine-tuning.

Key Findings:

  • Data diversity is crucial for successful zero-shot deployment. The more varied the training environments and objects, the better the RUM’s ability to generalize to new settings.
  • The choice of training algorithm and policy class matters less than the quality and diversity of training data.
  • While the quality of training data matters, a combination of expert and non-expert data can often lead to better performance.
  • Using an mLLM for self-critique and retrying can significantly improve the performance of RUMs.

The paper highlights the importance of focusing on data collection and developing robust training systems for creating general-purpose, zero-shot deployable robots. This opens up new opportunities for robots to operate in dynamic, unpredictable environments without the need for extensive human intervention.

The authors have open-sourced their code, data, and models to encourage further research and development of RUMs for a wider variety of tasks. This work represents a significant step towards creating robots that can truly adapt to new situations and perform tasks in real-world settings with minimal human effort.