When AI Remembers Too Much: New Benchmark Reveals LLMs Struggle to Read the Room
Imagine youâve told your AI assistant that you enjoy a bit of dry sarcasm, love using emojis, and prefer to be addressed by your nickname, âJoker.â For your daily casual chats, this works perfectly. But when you ask that same AI to draft a formal letter to the IRS to resolve a tax discrepancy, you probably donât want the message to start with: âHey there, Financial Wizard! Hope youâve got your golden star stickers ready because todayâs lesson is all about fixing that little tax âoopsieâ!â
This phenomenonâwhere an AIâs memory of your personality clashes with the social requirements of a taskâis the focus of a new research paper titled âBenchPreS.â Researchers from Yonsei University and LG AI Research have developed a benchmark to evaluate how well Large Language Models (LLMs) can selectively apply or suppress user preferences stored in their persistent memory.
The results? Even the worldâs most advanced AI models are remarkably bad at âreading the room.â
The Challenge of Selective Memory
As AI assistants move toward having âpersistent memoryââthe ability to remember facts and styles across different conversationsâthe goal is deep personalization. However, the researchers argue that true intelligence isnât just about remembering everything; itâs about knowing when to forget.
The BenchPreS benchmark tests models across 39 different scenarios in domains like finance, law, and health. It measures two key metrics: the Appropriate Application Rate (AAR), or how often the AI uses a preference when it should, and the Misapplication Rate (MR), or how often it uses a preference when it is socially or professionally inappropriate.
Concrete Failures in âSocial Intelligenceâ
The study provides several striking examples of current AI failures. In one instance, a model was tasked with writing to a bank loan officer. Despite the formal context, the AI used the userâs preferred nickname, âRambo,â and adopted a âcomedian perspective,â describing a rental history as looking âas empty as a salad bar at a donut convention.â
The researchers found a consistent trend: as models get better at following instructions, they actually get worse at context-aware selectivity. Instead of treating a userâs preference for emojis or sarcasm as a âhintâ to be used when appropriate, models tend to treat them as âglobally enforceable rules.â If the memory says âuse bold text,â the AI uses bold text everywhereâfrom a grocery list to a legal document.
The âThinkingâ Trap
Perhaps the most surprising finding involves the new wave of âreasoningâ models. One might assume that a model that âthinksâ before it speaks would realize a sarcastic tone is a bad idea for a court filing. Instead, the researchers found that reasoning often makes the problem worse.
In failure cases, the modelsâ internal thought traces showed them treating the userâs preferences like a mandatory checklist. One model specifically noted that the âschool newsletter formatâ was inappropriate for a government document, but then proceeded to use it anyway because it viewed the userâs preference as a âkey requirementâ that had to be satisfied.
Moving Toward Context-Aware AI
The paper concludes that current training paradigms prioritize âpreference adherenceâ at any cost. For AI to become truly useful agents in professional settings, they must move beyond being mere mimics of user style. They need to develop a sense of âcontextual integrityââthe ability to weigh a userâs personality against the norms of the world.
Until then, users might want to be careful what they ask their AI to remember; otherwise, âJokerâ might just show up to their next mortgage application.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.