AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Alibaba Releases Powerful Code-Specific Language Model Qwen2.5-Coder

Alibaba has released Qwen2.5-Coder, a significant upgrade to its previous code-specific model, CodeQwen1.5. The new model comes in two sizes, 1.5B and 7B parameters, and showcases impressive performance across a wide range of code-related tasks.

Qwen2.5-Coder is trained on a vast corpus of over 5.5 trillion tokens, making it one of the largest code-specific models available. This comprehensive dataset includes code from public repositories like GitHub, as well as large-scale web-crawled data containing code-related texts. The training process involves meticulous data cleaning and filtering to ensure high-quality data and a balanced mix of code, math, and general texts.

The model has been evaluated on over 10 popular code-related benchmarks, including code generation, completion, reasoning, and repair, achieving state-of-the-art performance across the board. For example, on HumanEval, a benchmark for code generation, Qwen2.5-Coder-7B-Base achieves a score of 61.6%, surpassing even larger models such as DS-Coder-33B-Base, which has 33 billion parameters.

The model also demonstrates its ability to work with multiple programming languages, achieving impressive performance on MultiPL-E, a benchmark for evaluating code generation across eight languages. Qwen2.5-Coder-7B-Instruct achieves an average accuracy of 76.5% on MultiPL-E, outperforming larger models, highlighting its powerful code generation capabilities in multiple programming languages.

The researchers also evaluated the model’s code reasoning capabilities using CRUXEval, a benchmark designed to assess the model’s ability to reason through code and predict output based on input or vice versa. Qwen2.5-Coder-7B-Instruct achieves scores of 65.8% and 65.9% on the input-CoT and output-CoT tasks, respectively, which are significantly higher than other open-source models.

The model also demonstrates its capability in code editing using the Aider benchmark, which evaluates the ability of models to collaborate with the Aider system to interpret natural language programming requests and translate them into executable code. Qwen2.5-Coder-7B-Instruct achieves a Pass@1 accuracy of 50.4% on the Aider benchmark, outperforming larger models.

Alibaba’s open-source release of Qwen2.5-Coder is expected to have a significant impact on the research and development of code intelligence. The model’s impressive capabilities and permissive licensing will encourage broader adoption by developers in real-world applications.