AI Models Learn to Stop Wasting Words, Solving Math Problems Twice as Fast
New research demonstrates that training AI on “easy” problems can dramatically cut down on unnecessary verbosity, leading to models that are both more accurate and significantly more efficient.
A team of researchers has developed a novel technique that forces large language models (LLMs) to reason concisely, effectively solving one of the major efficiency hurdles in advanced AI systems: runaway verbosity.
LLMs trained for complex tasks, particularly step-by-step mathematical reasoning, often become excessively wordy. This is typically because standard reinforcement learning pipelines (specifically Reinforcement Learning with Verifiable Rewards, or RLVR) filter out simple, “easy” problems to maximize training signal efficiency. This filtering process inadvertently biases the model toward long reasoning chains, leading the AI to conflate “thinking longer” with “thinking better,” resulting in inflated inference costs and latency.
The new approach, detailed in the paper Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR, flips this logic. Instead of discarding simple samples, the researchers retained and slightly up-weighted moderately easy problems.
Emergent Brevity for Free
The key insight is that these short, solvable tasks function as an “implicit length regularizer.” When exposed to problems that can be solved correctly with just a few tokens, the model learns that maximum output length is not necessary for maximum reward. This inductive bias subtly yet powerfully constrains the model’s output distribution.
The model gains what the authors term “emergent brevity for free.” It learns to solve even the most challenging problems efficiently, achieving conciseness without needing an explicit penalty for long outputs—a common, but often difficult to tune, technique.
To build intuition, consider a student learning algebra. If every test question requires two pages of work, the student concludes that long answers are always better. But if short, correct answers consistently receive high scores on simpler questions, the student learns to reserve lengthy explanations only for truly complex problems.
Efficiency Gains Exceeding 50%
The researchers validated their approach by fine-tuning a 4-billion parameter model, Qwen3-4B-Thinking-2507, creating two new variants dubbed the Frugal-Math models.
The results show dramatic efficiency improvements, particularly on difficult math benchmarks. For tasks like the American Invitational Mathematics Examination (AIME25) and Omni-MATH-Hard (Olympiad-level difficulty), the Frugal-Math models maintained or even improved accuracy while reducing the average solution length by 55% to 61%.
In raw token counts, the base model averaged over 11,491 tokens per sample across various benchmarks. The final optimized model (Frugal-Math-4B-Stage2) averaged just 5,712 tokens—a reduction of nearly half.
The findings underscore the crucial role of data curation in shaping efficient AI behavior. The study demonstrates that by balancing the training set to include concise, solvable examples, developers can achieve high reasoning performance while simultaneously reducing computational overhead, making advanced reasoning tools faster and more cost-effective.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.