Your Smart Glasses Can’t Remember Where You Left Your Keys—Yet
Imagine you’re rushing out the door and can’t find your wallet. You ask your AI-powered smart glasses, “Where did I put it?” To answer, the AI can’t just look at what is currently in front of you. It must recall that three hours ago, you set the wallet on the kitchen island, moved it to the hallway table while carrying groceries, and finally tucked it into a drawer.
This kind of complex, long-term recall is exactly what humans need from a personal AI assistant. Yet, today’s top artificial intelligence models are notoriously bad at it. To bridge this gap, researchers from The Ohio State University and Meta have introduced SuperMemory-VQA, a new benchmark dataset designed to test whether AI can actually serve as a reliable daily memory companion.
Currently, most AI video systems are evaluated on short, seconds-long clips to identify simple actions, like a person chopping an onion. SuperMemory-VQA changes the game by using 52.9 hours of real-world, first-person recordings captured via high-tech smart glasses. Packed with synchronized video, audio transcriptions, eye-gaze tracking, and spatial movement data, the dataset features 4,853 challenging, human-verified question-answer pairs that mimic real-world memory lapses.
Consider one of the benchmark’s concrete scenarios: a user is cooking and asks, “Was there a step I mentioned in my plan that I didn’t actually do?”
To answer, the AI cannot simply look at a single frame. It must first retrieve a conversation from hours earlier where the user discussed their recipe plan. It then has to comb through subsequent video chunks to observe the user opening the pot and eating the food, ultimately realizing the user skipped a planned 10-minute sautéing step. This requires linking speech, time, and visual evidence across completely separate moments.
SuperMemory-VQA categorizes everyday memory into six tasks, including object location, conversation recall, and timeline reconstruction. Crucially, the researchers designed the test to expose AI’s worst habit: hallucination. Instead of standard multiple-choice questions, each query features ordered options ranging from “correct” and “vague” to “incorrect,” along with an explicit “unanswerable” option.
This “unanswerable” choice is vital. If you ask your glasses, “What was the brand name on the stroller we passed during our walk?” and the camera only caught a blurry, unreadable glimpse of the stroller, a trustworthy AI must say, “I don’t have enough evidence to answer,” rather than confidently fabricating a brand name.
The results of the study show that today’s AI still has a long way to go. Even the most advanced models, like Google’s Gemini, topped out at just 61% accuracy on the benchmark. The models struggled heavily with tracking objects over time and frequently chose to “abstain” and say a question was unanswerable even when the evidence was clearly present in the footage.
By exposing these stark limitations, SuperMemory-VQA provides a crucial roadmap for the next generation of wearable AI, pushing developers to build systems that don’t just see the world, but truly remember it.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.