Transformers Have a Hard Time Remembering Where Things Are
đź“„ Full Paper
đź’¬ Ask
Large language models (LLMs) have become incredibly good at understanding and interacting with human language, but they struggle with seemingly simple tasks like arithmetic. Researchers from Qualcomm AI Research have found that a key reason for these failures is that LLMs lack the ability to perform random memory access within their context window.
Imagine trying to solve a simple arithmetic problem, like adding two numbers together. You need to be able to look up the numbers, remember their values, and then perform the necessary calculations. LLMs, on the other hand, seem to rely on the meaning of words, or the “content” of the information, to understand the context, rather than remembering where the information is located.
This is like trying to find a specific book in a library by looking for a book about a specific topic, but not remembering which shelf it’s on. You might find a lot of books about that topic, but you won’t be able to find the specific one you’re looking for.
The researchers explored this concept by studying how LLMs perform on the parity task, a simple arithmetic task that involves determining the parity of a sequence of binary digits. They found that models pre-trained on natural language struggle to generalize the parity task to longer sequences, suggesting that they are relying on “content-based addressing” rather than “index-based addressing”.
To address this limitation, the researchers proposed a simple but effective solution called “mnemonics”. By introducing specific marker tokens, or “mnemonics”, before each bit in the input sequence, they could indirectly guide the model to perform random memory access. The mnemonics serve as placeholders that allow the model to remember where the information is located within the context window.
This approach proved successful, allowing the model to learn and generalize the parity task to longer sequences. The researchers also found that by using “environment-forced mnemonics”, where the marker tokens were provided by the environment instead of being generated by the model, they could further improve the model’s ability to generalize the task.
These findings suggest that LLMs may need to be equipped with mechanisms for index-based addressing to perform algorithmic tasks more generally. By doing so, LLMs could unlock new capabilities and solve more complex problems that require reasoning and memory access.
This research sheds light on a fundamental limitation of LLMs and highlights a path for overcoming it. As LLMs continue to evolve, it is important to consider how they can better manage and access information within their context windows to unlock their full potential for solving more complex problems.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.