AI Papers Reader

Personalized digests of latest AI research

View on GitHub

Fine-tuning Foundation Models with Explained Variance Adaptation

Foundation models (FMs) are powerful tools for natural language processing, computer vision, and reinforcement learning, but their size and complexity can make them expensive to fine-tune. Parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaptation (LoRA) provide a solution by introducing a small number of new parameters to the pre-trained model. LoRA typically initializes these new parameters randomly, but a recent paper has shown that a data-driven initialization approach can significantly improve performance.

The paper, “One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation”, introduces a new method called Explained Variance Adaptation (EVA) that leverages information from the downstream task to initialize LoRA’s new parameters. EVA performs singular value decomposition (SVD) on minibatches of activation vectors to obtain a set of right-singular vectors. These vectors are sorted by the variance they explain, and the top-k vectors are used to initialize the LoRA matrix. This approach ensures that the new parameters capture as much of the variance in the downstream task as possible.

EVA has been shown to improve performance on a wide range of tasks, including language generation, language understanding, image classification, and reinforcement learning. For example, EVA outperforms other methods on tasks like BoolQ, PIQA, and HellaSwag, as well as on tasks like GSM8K, MATH, and VTAB-1K. The authors argue that EVA’s data-driven initialization scheme results in faster convergence and higher performance than competing methods, such as LoRA and its variants.

EVA’s data-driven initialization is performed in the early stages of fine-tuning and can be done efficiently, with minimal impact on training time. The authors also provide an ablation study to demonstrate the importance of both scale and direction in the initialization process.

This paper provides a significant contribution to the field of PEFT. EVA is a simple, efficient, and effective method that can be used to fine-tune foundation models without sacrificing performance. The authors believe that EVA is a promising approach for a wide range of applications and plan to investigate further improvements in the future.