Unveiling the Secrets of Instruction Following in Large Language Models
A new study from researchers at the Hong Kong University of Science and Technology offers a detailed look into how large language models (LLMs) learn to follow instructions, shedding light on the internal mechanisms that enable these AI systems to understand and execute commands. The research focuses on identifying and analyzing specific neurons and “experts” within LLMs that become specialized for processing different types of instructions during the fine-tuning process.
The core idea is that fine-tuning LLMs for instruction following doesn’t just broadly improve the model; it cultivates specialized circuits within the network. These circuits are composed of sparse components—specific neurons in standard models and both neurons and “experts” in Mixture-of-Experts (MoE) models—that are highly responsive to certain types of instructions.
To investigate this, the researchers introduced HEXAINST, a new dataset meticulously designed to cover six distinct instruction categories: classification, coding, general question answering, generation, math, and summarization. This dataset is balanced to ensure each category receives equal representation, reducing biases that could skew the results.
The study also presents SPARCOM, a novel analytical framework designed to identify, evaluate, and compare these instruction-specific components. SPARCOM involves three key steps:
-
Identification: Pinpointing the neurons or experts that are most actively involved when processing a particular type of instruction. For example, if you ask the model to “Write a Python function to calculate the factorial of a number,” certain neurons will consistently show a higher level of activity than others.
-
Evaluation: Assessing how general or unique these components are. Are the same neurons activated across different types of instructions, suggesting a general role in language processing? Or are they highly specific to the task at hand, indicating a specialized function?
-
Comparison: Analyzing how these components change before and after fine-tuning. This helps to understand how the fine-tuning process reshapes the model’s internal workings.
The research found that while some neurons are indeed “generalists,” firing across a range of instruction types, others are highly specialized. For instance, models exhibited a much darker coloration along the diagonal for task types like classification, summarization, code, and math. This highlights a high overlap of neurons only for same-type instruction suggesting a strong response to these same instruction tasks.
One key finding is that fine-tuning doesn’t drastically alter the fundamental way a model processes instructions. The overall distribution of these specialized neurons across different layers of the network remains relatively consistent. However, fine-tuning does increase the number and capability of these specialized components, allowing the model to handle a wider range of tasks more effectively.
The researchers propose a three-phase mechanistic framework for understanding how these instruction-specific neurons work. In the early stages, a large number of neurons are involved in encoding the basic concepts of the instruction. In the intermediate stage, the model generalizes and understands the instruction, leading to a reduction in the number of active neurons. Finally, in the final stage, a smaller set of specialized neurons facilitates the generation of the appropriate output.
These findings provide valuable insights into the inner workings of LLMs, paving the way for more interpretable and efficient model optimization.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.