Squeez: The AI Filter That Cuts Through the Noise of Coding Logs

🔊

💬 Ask

Coding agents—AI systems designed to write, debug, and manage software—are often heralded as the future of software engineering. However, these digital developers face a persistent “drinking from a firehose” problem. When an agent runs a command to check a server log or search a codebase, it often receives thousands of lines of data, only a handful of which actually contain the answer it needs.

A new research paper from KR Labs introduces Squeez, a specialized tool designed to solve this “noise” problem. By using a compact, fine-tuned language model, Squeez can prune away 92% of irrelevant tool output while preserving the critical evidence an agent needs to make its next move.

The Cost of Clutter

In the world of AI agents, “context” is currency. Every line of a 500-line build log or a massive git log output that an agent reads costs money and processing time. More importantly, long, noisy inputs can confuse even the most advanced models, leading to “hallucinations” or errors in logic.

The researchers argue that while general-purpose models like GPT-4 are great at reasoning, they shouldn’t be the ones doing the “grunt work” of filtering raw data. Instead, they propose a “task-conditioned tool-output pruner.”

To understand how this works, imagine an agent trying to fix a container crash. It runs a command like kubectl logs and receives a 250-line wall of text. The agent’s specific query is: “Find the reason the container was killed.” Instead of handing the agent all 250 lines, Squeez identifies the two specific lines—Reason: OOMKilled and Exit Code: 137—and discards the rest.

Small Model, Big Impact

To build Squeez, the researchers created a massive benchmark of over 11,000 examples based on real-world software engineering tasks (using the popular SWE-bench dataset) and synthetic data across various programming ecosystems like Go, Rust, and Kubernetes.

They fine-tuned a relatively tiny model—Qwen 3.5 with just 2 billion parameters—to perform this pruning. Despite its small size, Squeez-2B outperformed models 18 times its size, such as the Qwen 35B model, in identifying relevant information.

The results were stark: Squeez reached an 0.86 recall rate, meaning it successfully captured the “needle in the haystack” most of the time, even while removing nearly all the surrounding “hay.” By comparison, traditional search methods (like BM25) and simple heuristics (like just looking at the first or last few lines) failed to provide the necessary precision.

Why Verbatim Matters

One key distinction of Squeez is that it is “extractive” rather than “abstractive.” It does not summarize what happened; it provides the verbatim lines from the source. This is crucial for coding agents because a single character change in a stack trace or a specific version number in a dependency log can be the difference between a successful fix and a broken system.

For example, when tasked with finding a specific commit in a git log related to a “dimension-order change,” Squeez selects the exact commit entry. Larger, general-purpose models often “hallucinate” or select a plausible-looking but incorrect commit nearby.

A New Standard for AI Workflows

The researchers have released the model, the dataset, and a command-line interface (CLI) tool under an open-source license. Because Squeez is so small, it can be run cheaply and quickly, acting as a “pre-filter” for more expensive models like Claude or GPT-4.

By allowing AI agents to focus only on the evidence that matters, Squeez represents a significant step toward making autonomous coding more efficient, accurate, and cost-effective. The message for developers of agentic systems is clear: sometimes, to think bigger, your AI needs to see less.

AI Papers Reader

Personalized digests of latest AI research

Squeez: The AI Filter That Cuts Through the Noise of Coding Logs

The Cost of Clutter

Small Model, Big Impact

Why Verbatim Matters

A New Standard for AI Workflows

Chat about this paper