AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Breaking the AI “Silence Tax”: New Training Method Helps LLMs Think Before They Speak

Anyone who has used a “reasoning” AI model is familiar with the “silence tax.” You ask a complex math question, and the screen stays blank for thirty seconds while the model generates thousands of words of internal “Chain-of-Thought” (CoT) reasoning. Only after this long wait does the first word of the actual answer finally appear.

This delay happens because of a fundamental coupling in how AI works: in a standard interface, every token the model generates is both a piece of its internal “thought” and a public commitment to the user. If a model tries to speak too early to reduce wait times, it risks making a “premature commitment”—essentially blurting out a wrong guess that it then feels forced to justify in its later reasoning.

A new paper titled “When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning” introduces a clever solution called Side-by-Side (SxS) Interleaved Reasoning. Developed by researchers from several top institutions, including Zhejiang University and the University of Illinois, SxS allows a model to decide exactly when to share progress and when to keep its head down and keep thinking.

The Power of Interleaving

The core idea of SxS is to turn “disclosure” into a controllable action. Instead of a single stream of text, the model is trained to use two distinct modes: think and speak.

To build an intuition for this, imagine a student solving a complex multi-step physics problem on a whiteboard. Under the old system, the student works silently in their head for ten minutes and then writes the entire solution at once. Under the SxS system, the student might calculate the first variable, turn to the class and say, “First, we know the velocity is 5 m/s,” then turn back to the board to do more calculations, and then announce, “Therefore, the total force is 10 Newtons.”

The user sees progress in real-time, but the model isn’t “guessing.” It only “speaks” when its internal reasoning at that moment strictly supports the claim.

Learning to Be Certain

The researchers achieved this by training the model on “entailment-aligned” data. They used a larger “judge” model to look at raw reasoning traces and determine the exact moment a specific part of the answer became logically certain.

For example, if a model is solving the equation $3x + 5 = 20$, the judge model identifies that the moment the AI calculates $3x = 15$ in its internal “thought” channel, it is now “safe” to disclose to the user that $x = 5$. This ensures the model doesn’t just output “filler” text to look busy, but only provides substantive updates.

Better Performance, Less Waiting

The results are impressive. Testing on difficult benchmarks like AIME25 (high school math) and GPQA (graduate-level science), the researchers found that SxS significantly improved the “accuracy-content-latency” trade-off.

In one experiment using the Qwen3-4B model, the “Average Inter-Response Wait”—the gap between visible updates—dropped from over 21,000 tokens in standard models to just 8,500 tokens with SxS. Crucially, this was achieved without losing accuracy.

By treating the decision of when to speak as a learned skill rather than a fixed rule, the researchers have moved us closer to AI that feels more like a collaborative human expert: one who thinks deeply but keeps you in the loop as the solution emerges.