AI Papers Reader

Personalized digests of latest AI research

View on GitHub

AI Agents Ditch Language: New Framework Enables Collaboration via ‘Pure Thought’ for 4x Speedup

A new large language model (LLM) framework allows multi-agent systems (MAS) to collaborate using continuous internal representations—or “pure thought”—rather than relying on explicit text exchanges, achieving dramatic gains in speed, efficiency, and reasoning quality.

Dubbed LatentMAS, the training-free system overcomes the inherent communication bottleneck of traditional MAS, where agents must convert their complex reasoning into discrete natural language tokens (like a verbose internal monologue) for other agents to read and re-encode. Researchers from Princeton, the University of Illinois Urbana-Champaign, and Stanford demonstrated that LatentMAS is on average 4.3 times faster than text-based systems while using over 80% fewer computational tokens.

The breakthrough lies in moving collaboration from the discrete realm of text tokens to the continuous latent space—the hidden embeddings within the LLM’s transformer layers.

In conventional multi-agent pipelines, solving a complex problem—like multi-step math or coding—requires a sequence of agents (e.g., Planner, Critic, Refiner, Solver) to communicate via text. If a Planner generates a 100-word justification for a step, the Critic must read and process all 100 tokens.

LatentMAS short-circuits this process. When one agent (Agent A) reasons, it generates “latent thoughts” as auto-regressive sequences of last-layer hidden states. These continuous states inherently contain richer semantic structure than text. This information isn’t decoded into words but is instead transferred directly to the next agent (Agent B) via a shared latent working memory stored in the Key-Value (KV) cache.

This mechanism ensures a “lossless information transfer” without the computational cost of generating and decoding lengthy text. The paper theoretically proves that latent collaboration can be orders of magnitude more efficient than text-based reasoning for achieving the same level of expressiveness.

Empirical evaluations across nine complex reasoning benchmarks—including math (GSM8K), science (MedQA), and code generation—confirmed these efficiency gains. On the challenging AIME25 math task, for instance, LatentMAS improved accuracy by up to 3.3% over text-based MAS while reducing output token usage by over 74% and achieving a 3.5x speedup.

A case study on a multi-step math word problem highlighted the qualitative advantage. The traditional text-based MAS generated lengthy, convoluted intermediate steps, resulting in compounded reasoning errors and ultimately the wrong answer. In contrast, the continuous, high-fidelity transfer of latent working memory allowed subsequent LatentMAS agents to refine and correct upstream reasoning seamlessly, leading to a coherent path and the correct solution.

The framework consistently improves system-level reasoning accuracy by an average of 13.3% compared to using a single LLM model, demonstrating that latent collaboration not only saves time but fundamentally enhances the quality and stability of multi-agent reasoning.

LatentMAS represents a paradigm shift for agentic AI, enabling agents to operate as tightly integrated components that communicate at the speed of thought, moving beyond the limitations imposed by human language constraints.