New AI Technique 'Mixture of Inputs' Boosts Text Generation by Blending Certainty with Uncertainty
A new method called “Mixture of Inputs” (MOI) has emerged, offering a way to significantly improve text generation by large language models (LLMs) without requiring any additional training. This approach, described in a recent paper, allows LLMs to maintain a richer understanding of context by blending the most likely next word (the ‘sampled token’) with the range of other possibilities the model considered (the ‘token distribution’).
In standard text generation, an LLM predicts the probability of the next word in a sequence, then picks the most likely word. However, it immediately discards the probability distribution, losing valuable information about the alternative possibilities the model considered. MOI addresses this limitation by reintroducing this discarded information.
Imagine the LLM is writing a story and the context suggests the next word is likely to be “happy.” In conventional text generation, the model commits to “happy” and moves on. But MOI allows the model to also retain a sense of other possibilities, like “joyful” or “content,” even if they were slightly less probable. This helps the model adapt if the story takes an unexpected turn later on.
The technique employs a Bayesian estimation method, which mathematically combines the model’s initial prediction (the token distribution) with the chosen word. Specifically, MOI treats the predicted probability distribution of tokens as the “prior” and the sampled token as the “observation”. The method then calculates a “posterior expectation,” a mathematically-weighted average between all the possible words the LLM was predicting. This posterior expectation is then used to inform the next prediction.
The research paper’s results highlight the effectiveness of MOI across a range of LLMs, including QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen-32B. It showed improvements in mathematical reasoning, code generation, and PhD-level question-answering tasks. The improvements were particularly noticeable in tasks requiring multi-step reasoning, where maintaining uncertainty is critical.
For example, the researchers found that MOI improved performance on “Count Down 4,” a mathematical puzzle task, by explicitly representing uncertainty over arithmetic operations. This helped mitigate errors that tend to accumulate during multiple steps. The researchers also tested the method using other measures such as performance on AIME, GPQA-Diamond, and LiveCodeBench, and again saw performance improvements.
Importantly, MOI doesn’t require any changes to the model’s architecture or any retraining. This makes it immediately applicable to existing LLMs, with only “negligible computational overhead and minimal deployment effort,” the authors wrote.
The code for MOI is available on GitHub, allowing researchers and developers to readily integrate it into their existing LLM-based applications. This new technique may lead to the creation of more robust, context-aware, and ultimately more useful LLMs.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.