NORA: New AI Model Shrinks Robotic Brains While Boosting Real-World Performance
A team of researchers has unveiled NORA, a compact yet powerful AI model designed to drive robots in real-world environments. The model, detailed in a recently published paper, promises to bring advanced robotic capabilities to systems with limited computing power.
Current state-of-the-art AI models for robotics, often called Visual-Language-Action (VLA) models, are large and complex. While they exhibit impressive reasoning skills and can perform intricate tasks, their substantial computational demands make them unsuitable for many practical applications where speed and efficiency are crucial. These models can require powerful GPUs to function, limiting their use in real-time robotic systems.
NORA addresses this limitation with a novel approach. The researchers designed a 3-billion-parameter VLA model, significantly smaller than its counterparts, which can balloon past 7 billion parameters. This reduction in size dramatically lowers the computational burden, enabling robots to operate more efficiently and responsively.
“The goal was to create a model that could be fine-tuned on consumer-grade GPUs while maintaining high task performance,” says Chia-Yu Hung, one of the paper’s lead authors. “Existing models are just too computationally intensive for many real-world robotic applications.”
The team built NORA on the Qwen-2.5-VL-3B multimodal model, a robust base that excels in understanding visual and semantic information. The model uses a FAST+ tokenizer
which makes the action encoding efficient by reducing the dimensionality of the actions. Then they trained NORA on a dataset of 970,000 real-world robot demonstrations, effectively teaching it how to translate visual input and language instructions into robotic actions.
For example, consider a robot tasked with “picking up a red apple.” A larger, more complex model might analyze the scene in painstaking detail, processing every object and spatial relationship, potentially slowing down the robot’s response. NORA, on the other hand, can quickly identify the apple, determine the appropriate grasping action, and execute the task with minimal delay.
The paper showcases how NORA outperforms existing large-scale VLA models in various robotic tasks. In one example, NORA demonstrated superior performance in “out-of-domain” object grasping, where the robot had to pick up objects it hadn’t encountered during training. In experiments, NORA achieved a success rate of 90% when asked to “put the carrot in pot” and “put the banana in pot”.
NORA’s smaller size also enables faster fine-tuning, allowing users to customize the model for specific tasks and environments without requiring extensive computational resources. This ease of adaptation makes NORA a versatile tool for a wide range of robotic applications, from manufacturing and logistics to healthcare and assistive robotics.
The researchers have open-sourced the NORA framework, including model checkpoints, training strategies, and evaluation protocols, making it available to the wider research community. This move aims to foster further innovation in the field of scalable and efficient VLA models for robotics.
While NORA represents a significant step forward, the researchers acknowledge that there is still room for improvement. Future work will focus on enhancing NORA’s ability to handle more complex multi-object manipulation tasks and improving its robustness in dynamic and unpredictable environments.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.