AI Swarms: New Benchmark Tests Language Models' Swarm Intelligence

🔊

💬 Ask

Large language models (LLMs) are making waves in AI, showing promise in complex reasoning. However, a key question remains: can they effectively coordinate in decentralized multi-agent systems (MAS) like natural swarms, where agents have limited perception and communication?

To answer this, researchers at Gaoling School of Artificial Intelligence, Renmin University of China, have introduced SwarmBench, a novel benchmark designed to evaluate the swarm intelligence capabilities of LLMs acting as decentralized agents. Their work highlights the challenges of achieving coordination under strict constraints, mirroring the conditions found in natural swarms.

SwarmBench features five foundational MAS coordination tasks:

Pursuit: Agents collaboratively track and corner a faster-moving prey, testing their ability to coordinate containment strategies. Imagine a pack of LLM-controlled wolves trying to encircle a rabbit in a forest with limited visibility and noisy communication.
Synchronization: Agents synchronize an internal binary state, assessing their ability to form a consensus using local cues and communication. It’s like a group of fireflies trying to synchronize their flashes in a dark meadow, relying on each other’s light signals.
Foraging: Agents navigate an environment to find food and transport it to a nest, evaluating exploration, pathfinding, and task allocation. Think of a colony of LLM-driven ants exploring a maze to find sugar and bring it back to their anthill, efficiently dividing the labor.
Flocking: Agents move as a cohesive group, maintaining alignment and separation, which probes emergent formation control and coordinated movement. Picture a flock of LLM-powered birds flying in formation through a forest, dynamically adjusting to avoid obstacles and stay together.
Transport: Agents cooperate to push a large object towards a goal area, which tests coordinated force application and navigation around obstacles. It is like a team of LLM-operated robotic ants coordinating to move a large stone out of their path, by pushing it together, overcoming its weight.

The SwarmBench framework utilizes a configurable 2D grid environment where agents operate with a restricted local view and minimal communication, forcing reliance on local cues and implicit coordination. The research team evaluated several leading LLMs (e.g., deepseek-v3, 04-mini) in a zero-shot setting and found significant performance variations across tasks, indicating the difficulties posed by local information constraints.

The researchers discovered that while LLMs exhibited potential for basic coordination, they struggled with long-range planning and robust spatial reasoning under uncertainty. Successful coordination depended heavily on emergent strategies formed from local information. They also observed that behavioral flexibility and efficiency were significant drivers of performance.

SwarmBench is released as an open-source toolkit, comprising environments, prompts, evaluation scripts, and comprehensive datasets, with the aim of fostering reproducible research into LLM-based MAS coordination and the theoretical underpinnings of Embodied MAS. The code repository is available at https://github.com/x66ccff/swarmbench.

This research provides a dedicated platform to measure progress and guide future research towards developing LLMs capable of genuine collective intelligence in decentralized settings, crucial for applications ranging from autonomous robotic swarms to distributed computational networks.

AI Papers Reader

Personalized digests of latest AI research

AI Swarms: New Benchmark Tests Language Models' Swarm Intelligence

Chat about this paper