The Illusion of Caution: Why AI Risk Decisions Aren’t as Human as They Seem

🔊

💬 Ask

When a financial AI recommends a conservative investment portfolio, we naturally assume the system is exhibiting human-like caution. However, a study by researchers at Fudan University and the University of Rochester reveals that this apparent prudence is often just a superficial mask. Large Language Models (LLMs) frequently mimic safe human decisions without actually sharing the underlying reasoning mechanisms that humans use to evaluate risk.

To expose this gap, the researchers turned to a classical economics puzzle: the St. Petersburg Game.

The Infinite Coin Toss

Imagine a game where a fair coin is flipped until it lands on heads. If it lands on heads on the first flip, you win $2. If it takes two flips, you win $4; three flips, $8; and so on, doubling the prize money with each consecutive tails. Mathematically, because the payout potential doubles while the probability halves at each step, the expected payout of this game is infinite. A purely rational, profit-maximizing calculator should be willing to pay an infinite amount of money to play.

Yet, humans consistently refuse to pay much. Real people typically offer a modest $10 to $20, reflecting a natural aversion to risk and a cognitive boundary on extreme wealth.

When the researchers tested 28 prominent LLMs (including models from OpenAI, Anthropic, Google, and Alibaba), the AIs initially seemed remarkably human-aligned. Most of them avoided the infinite mathematical trap and offered finite bids, with a median offer of around $10 to $20.

Cracking the Facade

To find out if the AIs were actually “reasoning” like cautious humans, the researchers introduced subtle structural tweaks to the game. If the models possessed a human-like mechanism for assessing risk, their behaviors should have shifted in predictable, human-like ways. Instead, the models collapsed into rigid mathematical computation.

For example, in a “truncation” variant, the researchers capped the game at a maximum of 20 tosses. For a human, this minor limit barely changes the risk profile, and they would still bid cautiously low. For the AI, however, this cap introduced a hard mathematical boundary: the expected payout of a 20-toss game is exactly $21. Instantly, most models abandoned their “cautious” stance and bid exactly $21. They weren’t behaving like risk-averse humans; they were simply tracking mathematical limits.

In other tests, the researchers manipulated the player’s starting wealth or gave them occupational identities, such as a low-income agricultural worker versus a high-earning IT manager. While humans scale their risk-taking based on personal wealth and social roles, the LLMs showed weak, flat, or highly inconsistent reactions to these context clues.

Masking, Not Aligning

The study also evaluated common techniques used to steer AI behavior, such as “instruction tuning” or adding prompts like, “Imagine you are a human.”

While these interventions successfully lowered the dollar amounts of the AIs’ bids—making them look even more cautious on the surface—they failed to fix the underlying reasoning. When the tweaked scenarios were reintroduced, the models still defaulted back to non-human mathematical optimization.

Ultimately, the study warns that evaluating AI safety based solely on final decisions is a dangerous shortcut. If an AI agent in finance, medicine, or insurance is only mimicking human caution, a slight shift in real-world variables could cause its “prudence” to abruptly shatter, leaving behind unpredictable and high-stakes failures.

AI Papers Reader

Personalized digests of latest AI research

The Illusion of Caution: Why AI Risk Decisions Aren’t as Human as They Seem

The Infinite Coin Toss

Cracking the Facade

Masking, Not Aligning

Chat about this paper