AI Papers Reader

Personalized digests of latest AI research

View on GitHub

The Soul in the Machine: Why Avid Readers Still Prefer Human Translators Over AI

Can artificial intelligence capture the rhythmic, emotional heartbeat of a literary masterpiece? As publishers increasingly turn to Large Language Models (LLMs) to translate foreign fiction, a new study reveals that while AI translations are now passable, human translators still hold a crucial edge. Researchers from Simon Fraser University, the Université du Québec à Montréal, and Microsoft recently put state-of-the-art AI to the test against professional human translators, revealing exactly where the machine’s “soul” goes missing.

The study recruited 15 avid readers to evaluate translated excerpts from 15 recently published novels originally written in French, Polish, and Japanese. The researchers compared professional human translations (HT) with machine translations (MT) generated by an advanced, multi-step agentic AI pipeline. To capture the true reading experience, participants engaged in “immersive reading”—consuming uninterrupted 8,000-word blocks—before performing a side-by-side, chunk-by-chunk “close reading” comparison.

The results showed a slight but consistent preference for human translations during immersive reading, which widened significantly during close analysis. While readers rated AI translations as highly readable, they frequently noted that the AI lacked “soul.”

A concrete example from the translation of Polish author Wiesław Myśliwski’s Needle’s Eye highlights this gap. Where the AI rendered a line as “his legs refuse obedience and his eyes no longer come together in seeing,” the human translator wrote, “when your legs won’t do what they’re told to and your eyes can’t make out the world.” The human version flows naturally, while the AI version feels sterile and overly literal—what one reader dismissed as “translation for dummies.”

However, the AI occasionally won local battles by choosing more evocative vocabulary. In a Japanese novel, the AI opted for “bento containers” over the human translator’s generic “plastic containers,” anchoring the scene more vividly in its cultural context. In another historical text, the AI correctly favored “spectacles” over the human’s “glasses” to match the period. Yet, the overall quality of AI translation was highly volatile, marred by sudden “clunky” drops in sentence structure, whereas human prose remained stable.

Intriguingly, readers struggled to reliably identify which text was which, guessing correctly only about half the time. Many fell victim to “folk theories” about AI. For instance, some wrongly assumed a passage with frequent em-dashes or vulgar language must be human-authored, believing AI would be too “polite” or cleanly formatted.

Crucially, the study also found that popular automatic evaluation metrics—and even LLM-as-a-judge approaches—failed to capture these nuances, consistently favoring the AI over the human. Furthermore, when translating into target languages other than English, such as Spanish or Polish, the AI’s mask slipped entirely, exposing jarring grammatical errors and literal phrasing.

Ultimately, the research suggests that while AI has progressed from producing word-salad to highly competent drafts, it cannot replace the nuanced artistry of a human translator. For now, the true magic of literature remains a uniquely human craft.