MUSCLE: A Model Update Strategy for Compatible LLM Evolution

AuthorsJessica Echterhoff, Fartash Faghri, Raviteja Vemulapalli, Ting-Yao Hu, Chun-Liang Li, Oncel Tuzel, Hadi Pouransari

View publication

Large Language Models (LLMs) are regularly updated to enhance performance, typically through changes in data or architecture. Within the update process, developers often prioritize improving overall performance metrics, paying less attention to maintaining compatibility with earlier model versions. Instance-level degradation (instance regression) of performance from one model version to the next can interfere with a user's mental model of the capabilities of a particular language model. Users having to adapt their mental model with every update can lead to dissatisfaction, especially when the new model has degraded compared to a prior version for a known use case (model update regression). We find that when pretrained LLM base models are updated, fine-tuned user-facing downstream task adapters experience negative flips -- previously correct instances are now predicted incorrectly. We observe model update regression between different model versions on a diverse set of tasks and models, even when the downstream task training procedures remain identical. We argue for the importance of maintainingmodel update compatibility during updates, and present evaluation metrics designed specifically for generative tasks, while also being applicable to discriminative tasks. We propose a training strategy to minimize the extent of instance regression in model updates, involving training of a compatibility adapter that can enhance task fine-tuned language models. We show negative flips reduce by up to 40% e.g. when updating Llama 1 to Llama 2 with our proposed method.

Figure 1: A real example of a model update that introduces instance regression (negative flip, where a previously correct prediction becomes incorrect) (top). With our model update strategy using a compatibility adapter approach, we enhance model update compatibility to the previous model while maintaining the overall performance gain (e.g. measured by the ROUGE-1 score for the summarization task) of the model update (bottom).

MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Related readings and updates.

FastFill: Efficient Compatible Model Update

Forward Compatible Training for Large-Scale Embedding Retrieval Systems

Discover opportunities in Machine Learning.