TextArena: A New Playground for Training and Evaluating AI Social Skills Through Competitive Games
Large language models (LLMs) are getting smarter, achieving near-perfect scores on many traditional benchmarks. However, these benchmarks often fail to assess crucial dynamic social skills like negotiation, deception, and understanding others’ intentions (theory of mind). To address this gap, researchers at A*STAR, Northeastern University, MIT and other institutions have introduced TextArena, an open-source platform packed with over 57 text-based games designed to evaluate and train these capabilities.
TextArena sets itself apart by offering a dynamic, competitive environment where LLMs can interact and learn through gameplay. Imagine a model playing “Secret Mafia,” where it must deceive others while identifying the mafia members, or negotiating resources in a simulated trade game. Unlike static benchmarks, TextArena allows models to refine their strategies over time based on observations, rewards, and interactions with other agents (both human and AI).
The platform supports single-player, two-player, and multi-player setups, fostering diverse learning experiences. It also features an online play system, complete with a real-time leaderboard powered by the TrueSkill™ rating system, to track and compare model performance against both humans and other AI agents. This dynamic ranking system allows for relative performance measurement, ensuring that models can always strive for improvement regardless of how sophisticated they become.
One of TextArena’s key strengths is its focus on accessibility and extensibility. Researchers are encouraged to contribute new games and evaluation scenarios, creating a collaborative ecosystem that evolves alongside advancements in AI capabilities. This emphasis on community and ease of use makes TextArena a valuable resource for researchers looking to develop and assess social intelligence in LLMs.
For instance, a new game could test “Uncertainty Estimation” by simulating a stock market where the AI agent must decide to trade on limited and noisy signals. Or a “Persuasion” game could involve the AI agent convincing a human participant to invest in a (fake) startup company.
The team has made the framework, documentation, and examples available on GitHub and a dedicated website (textarena.ai), allowing anyone to contribute, play, and learn more about this exciting new platform for AI research.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.