Language Models Exhibit Language-Specific Knowledge, Can Reason Better in Some Languages Than English
In a new study, researchers at the University of Illinois, Urbana-Champaign, have discovered that large language models (LLMs) possess what they call “Language Specific Knowledge” (LSK), meaning they can sometimes reason and answer questions more accurately when using a language other than English. This challenges the assumption that LLMs, often trained primarily on English data, perform best when reasoning in English.
The researchers define LSK as knowledge that an LLM appears to access more readily or represent more accurately when queried in a particular language, which they term the “expert language”. To illustrate this, they asked the same question in English, Hindi, and Arabic to GPT-4 Turbo: “What is the tradition of one family giving another family gifts during marriage called?” The model’s response varied significantly. When queried in Hindi, it accurately described “Dahej,” the dowry tradition in India where the bride’s family gives wealth to the groom’s family. The Arabic query elicited a response about “Mahr,” an Islamic tradition where the groom’s family gives wealth to the bride. However, the English query produced a more generic answer, mentioning “dowry” and “bride price” without referencing specific cultural contexts.
The researchers hypothesize that LLMs, like humans who code-switch, may perform better in certain languages for specific topics due to the abundance of culturally specific texts in those languages. This allows specific knowledge to be encoded more richly. Think of it like asking a financial expert about tax law - you’d expect better insights from someone specializing in that area.
To leverage this LSK, the researchers developed a two-stage framework called “LSKEXTRACTOR.” The first stage, “Mapping LSK,” identifies expert languages for different knowledge areas. It involves prompting LLMs in multiple languages to perform chain-of-thought reasoning on culture-specific questions. The language yielding the most accurate reasoning for each cluster of questions becomes the expert language for that topic.
The second stage, “LSK-Informed Reasoning,” uses this mapping during inference. When a new question is posed, the framework identifies the relevant knowledge area and then prompts the LLM to reason in the corresponding expert language.
The team tested LSKEXTRACTOR on datasets containing cultural and societal norms, using models such as Google’s Gemma, Meta’s Llama 3, and Microsoft’s Phi-3. The results showed an average relative improvement of 10% in accuracy across various models and datasets when using LSKEXTRACTOR. This gain in accuracy, achieved without additional fine-tuning, highlights the untapped potential of multilingual strengths in existing LLMs. The paper notes that smaller models saw larger improvements suggesting they can benefit the most from this technique.
This research opens avenues for developing more inclusive and culturally aligned language models. It demonstrates that embracing linguistic diversity and strategically using code-switching can unlock deeper knowledge and improve reasoning capabilities in LLMs. The authors have made their code publicly available to encourage further research in this area.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.