Scrubbing Sensitive Data from Code Models: A New Approach to Machine Unlearning
Code language models (CLMs), such as those used for generating and summarizing code, have become incredibly powerful tools. However, recent research has uncovered a significant privacy concern: these models can inadvertently memorize and reproduce sensitive information from their training data. This could include anything from personal details like email addresses and passwords to proprietary API keys.
Traditionally, addressing this issue has involved costly and time-consuming full model retraining. Now, a new study introduces a more efficient and targeted method called CODEERASER, which leverages “machine unlearning” to scrub sensitive information from already trained models without the need for complete retraining.
The Problem of Memorization
CLMs learn by processing vast amounts of code, often scraped from public repositories. During this process, they can sometimes “memorize” specific snippets of code, including sensitive data that might have been accidentally included. When prompted appropriately, these models can then regurgitate this memorized information verbatim. This poses a serious privacy risk, akin to a model leaking confidential data it was never intended to retain.
Previous attempts to mitigate this include removing duplicate data before training or employing differential privacy techniques. However, these methods often require retraining the entire model, which is computationally expensive and impractical for deployed models.
CODEERASER: Selective Unlearning
The researchers behind this new paper propose CODEERASER, a novel machine unlearning technique. Instead of erasing entire code snippets, CODEERASER intelligently targets and removes only the sensitive segments within a piece of code. It achieves this by:
- Identifying Sensitive Data: Using a tool called
detect-secrets
, the method precisely pinpoints sensitive elements like API keys and passwords within code. - Targeted Gradient Ascent: CODEERASER applies a specific learning update (gradient ascent) only to these identified sensitive segments. This “pushes” the model to forget this information.
- Preserving Code Integrity: Crucially, while erasing sensitive data, CODEERASER uses another technique (gradient descent) on the surrounding “non-sensitive” code. This ensures that the overall structure and functionality of the code remain intact.
- Maintaining Model Utility: A constraint-based approach is also incorporated to ensure that the model’s general coding capabilities, evaluated on benchmarks like HumanEval, are not significantly degraded.
Promising Results
The study involved extensive experiments on three popular CLM families: CodeParrot, CodeGen-Mono, and Qwen2.5-Coder. The results demonstrate that CODEERASER is highly effective at erasing targeted sensitive information. For example, on the Qwen2.5-Coder-7B model, CODEERASER reduced memorization by an impressive 93.89% while retaining nearly 99% of the model’s original performance.
Furthermore, CODEERASER is remarkably efficient. The unlearning process for a batch of sensitive samples takes mere seconds, a stark contrast to the days or even weeks required for full retraining. The research also highlights that while factors like the type and duplication frequency of sensitive data can influence unlearning, CODEERASER remains robust.
This work represents a significant step forward in addressing the privacy vulnerabilities of code language models, offering a practical and efficient solution for scrubbing sensitive data without compromising the utility of these powerful AI tools.
Chat about this paper
To chat about this paper, you'll need a free Gemini API key from Google AI Studio.
Your API key will be stored securely in your browser's local storage.