Cognitive Biases in AI: Pretraining Holds the Key, Not Fine-Tuning
Large language models (LLMs) are increasingly exhibiting human-like cognitive biases, leading to irrational decision-making. A recent study published in COLM 2025, “Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs,” by Itay Itzhak, Yonatan Belinkov, and Gabriel Stanovsky, sheds light on where these biases originate within the LLM development pipeline. The research suggests that the foundational pretraining phase, rather than the subsequent fine-tuning process, is the primary driver of these biases.
The study employed a novel two-step experimental approach to disentangle the influences of pretraining, fine-tuning data, and training randomness. First, researchers finetuned models multiple times with different random seeds, a process akin to slightly varying the ingredients or oven temperature when baking cookies. This step was designed to see how randomness in training affects over 30 different cognitive biases. Second, they introduced “cross-tuning,” where instruction datasets were swapped between models. This allowed them to isolate whether biases were dependent on the specific data used for fine-tuning.
Key Findings:
The findings reveal that while training randomness introduces some minor fluctuations in bias, the core patterns of biases are largely determined during the pretraining stage. Models with the same foundational pretraining, even when finetuned on different datasets, tended to exhibit similar bias patterns. Conversely, models that shared the same fine-tuning data but originated from different pretrained models showed more distinct biases.
Concrete Examples and Intuition:
Imagine two AI models, Model A and Model B. Both start with the same fundamental knowledge base, like having learned basic grammar and facts about the world during their “pre-school” (pretraining).
-
Scenario 1: Randomness in Fine-tuning: If you fine-tune both Model A and Model B using slightly different sets of instructions for a task (e.g., how to write a product review), the resulting reviews might have minor variations due to the randomness in the instruction process. For example, one might be slightly more enthusiastic than the other. However, their core understanding of what constitutes a “good” review would likely remain similar if they started from the same pretraining. The study found that while these minor variations occur, they don’t fundamentally change the AI’s underlying biases.
-
Scenario 2: Cross-Tuning (Pretraining vs. Fine-tuning Data): Now, consider a different experiment. Take Model A (pretrained on general internet text) and Model C (pretrained on a specific dataset of legal documents).
- If you finetune Model A on instructions meant for legal analysis, and Model C on instructions for general writing, their outputs would differ. The study’s “cross-tuning” approach is like taking Model A and giving it legal instructions, and taking Model C and giving it general writing instructions.
- The results showed that even with swapped instructions, Model A (the one with the legal pretraining) would still exhibit biases related to legal reasoning, and Model C (with general pretraining) would show biases more typical of general language. This indicates that the initial “education” (pretraining) has a stronger, lasting impact on their “worldview” and decision-making tendencies (biases) than the specific “job training” (fine-tuning).
For instance, if a model exhibits a “framing effect”—where its response changes based on how information is presented (e.g., preferring a treatment with a “90% survival rate” over a “10% mortality rate,” even if logically equivalent)—this tendency is likely deeply embedded during its initial pretraining. Fine-tuning might subtly alter how this bias is expressed, but it won’t typically erase it or fundamentally change its nature.
The research highlights that to effectively understand and mitigate cognitive biases in LLMs, efforts should focus significantly on the pretraining phase. This perspective can guide future strategies for developing more reliable and less biased AI systems.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.