AI Papers Reader

Personalized digests of latest AI research

View on GitHub

VerifiAgent: AI Tool Unifies Verification of Language Model Reasoning

Large language models (LLMs) are powerful reasoning tools, but they sometimes produce inaccurate results. Current methods for verifying LLM outputs are often tailored to specific models or tasks, and they can be computationally expensive, which is hard to scale them for diverse reasoning challenges. Researchers from Monash University and VinUniversity have introduced “VerifiAgent,” a novel approach to address these limitations.

VerifiAgent is designed as a unified verification agent with a two-layered structure. First, “meta-verification” assesses the completeness and consistency of the LLM’s response. Think of it like a quick sanity check: Does the answer address all parts of the question, and do the steps make logical sense? If this initial check fails, VerifiAgent stops, preventing flawed reasoning from proceeding to the next stage.

Second, “tool-based adaptive verification” comes into play. This layer is where VerifiAgent really shines. It intelligently selects the best external tool to verify the answer based on the type of reasoning involved. For example, if the LLM has just performed a math problem it uses a Python interpreter to run code and check the arithmetic to see if the result is right. For a question about factual knowledge such as “which country has the largest population,” it employs a search engine to compare the LLM’s answer against real-world data. For questions requiring logical steps, it uses a symbolic solver. This “adaptive” nature allows VerifiAgent to efficiently and robustly verify a wide range of reasoning tasks.

The researchers tested VerifiAgent on mathematical, logical, commonsense and hybrid reasoning tasks, comparing it against existing verification methods like deductive and backward verifiers. Results showed that VerifiAgent consistently outperformed these baselines, demonstrating superior verification accuracy. Moreover, by leveraging feedback from the verification process, VerifiAgent can enhance the reasoning accuracy of the underlying LLM itself.

The paper also demonstrates that VerifiAgent can be effectively applied to inference scaling. Traditional methods for improving LLM performance during inference involve generating multiple responses and then selecting the best one, but that requires a lot of processing power. The study found that VerifiAgent can achieve better results at lower computational cost. For example, the agent selects to end, continuing until one passes verification or the maximum number of samples is reached. This significantly improves the accuracy of the large language model outputs.

The code for VerifiAgent is publicly available, facilitating further research and development in this crucial area of LLM validation. The researchers noted that improvements to backbone LLMs will continue to improve the performance of VerifiAgent in the future, offering a scalable solution to improve the reliability and trustworthiness of LLMs across diverse applications.