Morgan Stanley Pioneers Q Programming Language Fine-Tuning for LLMs
New York, NY – August 12, 2025 – In a significant stride towards making large language models (LLMs) proficient in specialized programming languages, Morgan Stanley researchers have unveiled a comprehensive, open-source framework for adapting LLMs to the Q programming language. Q, a niche but vital language in quantitative finance for high-performance analytics and time-series data, has historically posed a challenge for general-purpose AI models due to its unique syntax and limited online presence.
The paper, titled “Technical Report: Full-Stack Fine-Tuning for the Q Programming Language,” details a multi-stage process to imbue LLMs with a strong understanding of Q. This effort culminates in a suite of Qwen-2.5 based models, available in five parameter sizes (1.5B, 3B, 7B, 14B, and 32B), which reportedly surpass even leading proprietary models like Claude Opus-4 on a specially designed Q benchmark.
The core of the research lies in addressing the scarcity of Q-specific data. The team developed a “LeetCode-style” evaluation dataset for Q, a novel approach that facilitates standardized testing and benchmarking. This dataset was built through an iterative, “model-in-the-loop” process, where LLMs were used to generate both Q code solutions and corresponding test cases, with rigorous automated verification and human oversight to ensure quality and prevent “reward hacking” where models exploit evaluation loopholes.
“Even though large language models are becoming increasingly capable, it is still unreasonable to expect them to excel at tasks that are under-represented on the Internet,” the paper states. “Leveraging LLMs for specialized applications, particularly in niche programming languages and private domains, remains challenging and largely unsolved.”
The fine-tuning process involved three key stages:
- Pretraining: Models were first exposed to a broad corpus of Q code, including open-source repositories and official documentation, to learn the language’s syntax and idioms.
- Supervised Fine-Tuning (SFT): Models were then trained on the curated LeetCode-Q dataset, learning to translate between natural language descriptions, Python code, and Q code for specific algorithmic tasks.
- Reinforcement Learning (RL): Finally, reinforcement learning techniques, specifically Group Relative Policy Optimization (GRPO), were employed to further refine the models’ ability to generate correct and efficient Q code, aligning their behavior with the evaluation harness.
The results are striking. The largest Qwen-2.5 model achieved a pass@1 accuracy of 59% on the Q benchmark, a significant improvement over Claude Opus-4’s 29.5%. Notably, all of the fine-tuned Qwen-2.5 models, even the smallest 1.5B variant, outperformed GPT-4.1 on this challenging task.
Beyond the models themselves, Morgan Stanley is releasing the complete codebase, datasets, and training scripts, providing a detailed blueprint for adapting LLMs to any domain with robust evaluation mechanisms. This open-source approach aims to lower the barrier for others to experiment and build upon their work, fostering advancements in LLM specialization for niche fields.
The researchers emphasize that while their LeetCode-style dataset mirrors Pythonic coding practices and may not fully represent the SQL-like, database-centric usage of Q in real-world finance, their methodology is broadly applicable. They envision this work as a foundation for future research into domains where data is scarce and specialized LLM capabilities are crucial.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.