AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Xiaoduo AI Lab Unveils Xmodel-2: A 1.2 Billion-Parameter LLM for Reasoning Tasks

Beijing, China – A new large language model (LLM) designed specifically for complex reasoning tasks has been released by the Xiaoduo AI Lab. Called Xmodel-2, this 1.2-billion parameter model boasts state-of-the-art performance on a range of benchmark tests, while maintaining a relatively low training cost. The model’s architecture and training techniques represent a significant advancement in efficient LLM design, making it accessible to a broader range of researchers and practitioners.

Xmodel-2’s key innovation lies in its architecture. Unlike many LLMs where architecture changes significantly with model size, Xmodel-2 uses a unified set of hyperparameters across different scales. This means that experiments conducted on smaller models can be seamlessly transferred to larger ones, significantly reducing the time and resources required for optimization. This approach is similar to transferring learnings from a smaller model to its larger counterpart which significantly improves training efficiency and reduces costs. Think of it as building a smaller prototype car, testing it extensively, then directly applying the learnings to building a larger, more capable car. The same concept applies here, resulting in substantial resource savings.

To further enhance training efficiency and stability, Xmodel-2 employs the Warmup-Stable-Decay (WSD) learning rate scheduler, originally used in the MiniCPM model. This scheduler adjusts the learning rate dynamically throughout the training process, helping to prevent instability and improve convergence. The researchers fine-tuned the decay phase of the scheduler, leading to substantial improvements in reasoning performance.

The model was pre-trained on a massive dataset of 1.5 trillion tokens, sourced from diverse open datasets including text and code. This diverse training data contributes to its robust performance on various tasks. The training process was divided into two stages: a stable training phase and a decay phase. The decay phase incorporated high-quality supervised fine-tuning (SFT) data, with the ratio of SFT data to pre-training data optimized through extensive experimentation (around 64% SFT data). This optimization significantly improved complex reasoning performance (a 29.31% increase over the baseline). These different datasets were strategically combined, such that instruction formatted data in mathematics and code proved particularly beneficial for complex reasoning.

The researchers rigorously evaluated Xmodel-2’s performance on various benchmarks. In commonsense reasoning tasks, such as those evaluated by the Language Model Evaluation Harness (e.g., ARC-Challenge, BoolQ, SciQ), it outperformed or matched the majority of comparable models. On complex reasoning tasks (GSM8K, MATH, BBH, MMLU, HumanEval, MBPP), Xmodel-2 again achieved state-of-the-art performance among similarly sized models. Furthermore, its performance on agent-based tasks (HotpotQA, FEVER, AlfWorld, WebShop) demonstrated its potential for real-world applications like customer service and task automation. For example, in FEVER, which involves verifying factual claims, Xmodel-2 significantly outperformed previous models in its category. In tasks involving reasoning across multiple documents, Xmodel-2 also displayed a marked improvement.

The Xiaoduo AI Lab has made Xmodel-2’s model checkpoints and code publicly available on GitHub, encouraging further research and development within the community. This open-source approach is remarkable considering the model’s impressive performance and its implications for a broader adoption of robust, efficient LLMs for reasoning. The researchers also extensively documented the hyperparameters, which facilitates reproducible research and helps others leverage the lessons learned in creating Xmodel-2. The release of Xmodel-2 signals a significant step towards more accessible and efficient LLMs designed for reasoning tasks, opening up new possibilities for both research and practical applications.