AI Papers Reader

Personalized digests of latest AI research

View on GitHub

LLMs Don’t Reason Like Humans: New Cognitive Taxonomy Reveals Mismatch in AI Problem-Solving

A massive empirical study analyzing nearly 200,000 reasoning traces from large language models (LLMs) has revealed that artificial intelligence approaches complex problems using cognitive strategies fundamentally misaligned with those that correlate with human success.

The research, conducted by an inter-university team, introduces a comprehensive taxonomy of 28 cognitive elements—synthesized from decades of human cognitive science—to finally offer a standardized vocabulary for diagnosing reasoning failures in AI. The elements are grouped into four dimensions: Reasoning Invariants (e.g., logical coherence), Meta-Cognitive Controls (e.g., self-awareness), Representations (e.g., hierarchical organization), and Operations (e.g., forward chaining).

By comparing the usage patterns of 18 LLMs (across text, vision, and audio tasks) against 54 human “think-aloud” traces, the researchers uncovered a critical paradox: LLMs consistently default to a limited set of rigid, easily quantifiable behaviors, even when a more diverse and adaptable strategy is required for success.

The Misalignment Gap

The analysis shows that LLMs are heavily biased toward simple linear processing: sequential organization and forward chaining (reasoning from facts to goals). This rigid approach proves effective on well-structured problems (like algorithms or story problems) but fails catastrophically on complex, “ill-structured” tasks such as design, case analysis, or ethical dilemmas, where constraints are unclear and multiple solutions exist.

On these complex tasks, successful reasoning requires a broader range of elements. For instance, Meta-Cognitive Controls—the higher-order functions humans use to monitor and adapt their thought process—were strongly correlated with success, but systematically underutilized by LLMs. Self-awareness (assessing one’s own capabilities and knowledge state) appeared in 49% of successful human traces but only 19% of LLM traces.

For intuition, consider a complex Diagnosis-Solution problem, such as troubleshooting a network or diagnosing a medical issue. Successful reasoning involves a deliberate scoping strategy: first applying selective attention to filter symptoms, then achieving knowledge alignment with domain constraints, before engaging in forward solution steps. The study found that LLMs often bypass this crucial scoping phase entirely, rushing immediately into forward chaining, which leads to systematic failures on problems requiring careful constraint satisfaction.

Eliciting Latent Capabilities

The findings suggest that current LLM training (which often rewards only the final correct answer) leads models to prioritize easily executed “spurious shortcuts” over robust, human-like cognitive mechanisms.

To test whether models lack these successful behaviors entirely or just fail to deploy them spontaneously, the team developed test-time reasoning guidance. This intervention automatically converts empirically successful human-like reasoning structures into explicit prompts, essentially scaffolding the LLM’s thought process.

Applying this guidance resulted in performance improvements of up to 66.7% on ill-structured problems like diagnosis and dilemmas. This dramatic boost confirms that capable LLMs possess the underlying cognitive components necessary for successful reasoning, but they fail to integrate and sequence them strategically without explicit structural instruction.

This research establishes a necessary framework for shifting LLM development away from chasing benchmark performance and toward building models that rely on robust, verifiable cognitive mechanisms.