The Ghost in the Machine: AI Models Converge on Universal “Algorithmic Cores”
In the field of artificial intelligence, there is a persistent mystery: if you train two identical Large Language Models (LLMs) on the same data, they will behave the same way, but their internal “wiring”—the billions of numerical weights—will look completely different. This phenomenon, known as “functional equivalence,” has long frustrated researchers trying to understand the inner workings of AI. If every model is a unique snowflake of random numbers, how can we ever truly explain how they think?
A provocative new paper by researcher Joshua S. Schiffman suggests that we’ve been looking at the wrong level of detail. The study, titled “Transformers converge to invariant algorithmic cores,” argues that beneath the chaotic surface of neural network weights, different models actually build the exact same “computational engines” to solve specific tasks. Schiffman calls these universal structures algorithmic cores.
Finding the Needle in the High-Dimensional Haystack
To find these cores, Schiffman developed a method called Algorithmic Core Extraction (ACE). Instead of looking at every individual neuron, ACE identifies “subspaces”—tiny, low-dimensional neighborhoods within the model’s vast internal state that are both active and essential for a task.
The results were striking across three different levels of complexity:
1. The “Weather Machine” (Markov Chains) In the simplest experiment, Schiffman trained small models to predict the next step in a random sequence (like a four-state weather pattern). While the internal weights of the models were totally different, they all isolated an identical 3-dimensional core. When researchers looked inside this core, they found it mathematically mirrored the “ground truth” logic of the task itself. It was as if every model had independently built the same internal compass to navigate the problem.
2. The “Aha!” Moment (Modular Addition) When AI models learn “clock math” (modular addition), they often experience a sudden leap from memorization to understanding—a process called “grokking.” Schiffman discovered that at the exact moment of grokking, a compact “rotational core” crystallizes within the model.
Interestingly, if the model continues training long after it has mastered the task, it becomes “over-educated.” It begins to spread this core across more dimensions to create redundancy. Think of it like a pilot who first learns to fly using a single control stick, but eventually builds a cockpit where ten different levers all do the same thing. The “core” remains the same, but the model becomes more robust by distributing the work.
3. The Grammar Switch (GPT-2) Finally, Schiffman looked at the massive GPT-2 models used for real-world text. He focused on a specific task: subject-verb agreement (e.g., knowing that “The key next to the cabinets is…” requires a singular verb).
He found that across GPT-2 Small, Medium, and Large, this entire grammatical rule is governed by a single, one-dimensional axis—a “grammar switch.” By flipping this one internal switch, researchers could force the model to invert its logic, making it confidently say “The keys next to the cabinet is…” throughout an entire page of generated text.
A New Map for AI Safety
This research suggests that mechanistic interpretability—the quest to explain AI—should stop obsessing over specific “wires” (neurons) and start looking for these universal “blueprints” (cores).
If safety-critical behaviors, such as a model’s refusal to provide dangerous information, are also governed by these compact, invariant cores, it would give researchers a “steering wheel” for AI. Instead of guessing how to prompt a model, we might one day control it by simply identifying and toggling its internal algorithmic engines. In the search for the “ghost in the machine,” we may have finally found where the machine keeps its logic.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.