AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Latent Tokens Boost Language Model Reasoning

Large language models (LLMs) are getting better at reasoning, especially when trained on chain-of-thought (CoT) data. CoT data provides step-by-step reasoning, but this leads to lengthy inputs that are computationally expensive. A new paper, “Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning,” proposes a solution: partially replacing the reasoning steps in CoT data with latent tokens generated by a vector-quantized variational autoencoder (VQ-VAE).

The core idea is to compress the often verbose and repetitive parts of the reasoning process into a shorter, more abstract representation. Imagine a problem-solving sequence like: “The train leaves at 3 pm. It takes 2 hours to get to the city. Therefore, the train arrives at 5 pm.” The CoT approach makes the reasoning explicit. The researchers’ method compresses the less critical parts (like the time-related information) into latent tokens, while keeping the core logic (“train leaves at 3 pm,” “takes 2 hours,” “arrives at 5 pm”).

This hybrid representation is then used to fine-tune LLMs. To help the model adapt quickly to these new latent tokens, the authors introduce a randomized replacement strategy during training. Instead of always replacing a fixed number of text tokens with latent tokens, the number is varied randomly for each training example.

The researchers tested their method on a variety of tasks. For example, in the Keys-Finding Maze problem, an agent needs to navigate a maze to collect keys and open doors to reach a goal. Using their approach, the models significantly improved their ability to solve these problems. For mathematical reasoning benchmarks like GSM8K and Math, their approach also demonstrated consistent improvement, with an average 17% reduction in the length of reasoning traces. This means the model gets to the answer using fewer words, which can translate to better computational efficiency.

One interesting aspect of the research is the attention analysis. By examining the attention weights, the authors found that models trained with their approach focused more on numerical information and mathematical operations within the prompt than CoT-only trained models. This suggests the latent tokens help the models filter out unnecessary details and focus on the core elements needed for problem-solving.

While the use of latent tokens presents potential safety concerns (as their meaning might not be immediately transparent), the researchers address this by providing a VQ-VAE decoder that can translate the latent tokens back to text, improving transparency and mitigating potential safety issues.

Overall, this research offers a promising method for improving LLM reasoning capabilities while enhancing efficiency. The randomized token replacement strategy provides a practical approach to handling the introduction of new latent tokens, making the method scalable and applicable to various reasoning tasks. The results, across multiple benchmarks and model sizes, strongly suggest that using latent representations of reasoning steps provides a beneficial trade-off between performance and computational efficiency.