AI Papers Reader

Personalized digests of latest AI research

View on GitHub

InfinityMATH: A Scalable Instruction Tuning Dataset for Programmatic Mathematical Reasoning

Large language models (LLMs) are becoming increasingly adept at mathematical reasoning, but there’s a critical need for scalable datasets that effectively train these models to tackle complex problems. Existing methods for creating such datasets often require substantial seed data and expensive computation, hindering scalability.

This paper introduces InfinityMATH, a novel dataset designed to overcome these limitations and enable more efficient instruction tuning for programmatic mathematical reasoning. At the heart of InfinityMATH lies a clever data synthesis pipeline that separates numerical values from the core structure of mathematical problems. This approach allows for generating numerous variations of a problem by simply changing the numerical values, while preserving the underlying reasoning logic.

For instance, consider a word problem asking for the total cost of wallpaper: “A hand-painted wallpaper costs $400 at the market. A DIY will saves 20% after considering the materials cost. If Ethan made his own hand-painted wallpaper, how much was the total cost?”

The InfinityMATH pipeline first identifies and extracts the numerical values ($400 and 20%) from the problem, replacing them with placeholders. Then, an LLM (like GPT-4) is prompted to generate a program that solves the generalized problem, independent of the specific numerical values. This program might look something like:

total_cost = market_cost * (1 - savings)
print(total_cost)

Finally, the InfinityMATH pipeline repopulates the placeholder variables with a wide range of numerical values, effectively creating a vast number of variations of the original problem while maintaining the logical structure of the solution. This process enables efficient scaling without relying on expensive computations.

Experiments show that models fine-tuned with InfinityMATH significantly outperform those trained on existing datasets, particularly when tested on problems with varied numerical values. This suggests that InfinityMATH effectively addresses the challenge of “logical inconsistencies in reasoning,” where slight changes in numbers can disrupt the logical flow of a solution program generated by LLMs.

InfinityMATH offers a promising approach for scaling instruction tuning datasets for mathematical reasoning. By decoupling numbers from the core problem structure, InfinityMATH empowers the development of robust and scalable LLM models that can effectively tackle a wider range of mathematical challenges. The dataset is publicly available and can be readily integrated into existing instruction tuning frameworks, accelerating progress in the field of programmatic mathematical reasoning.