Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence
AuthorsValérie Castin†‡, Kimia Nadjahi†‡, Pierre Ablin, Gabriel Peyr醇
Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence
AuthorsValérie Castin†‡, Kimia Nadjahi†‡, Pierre Ablin, Gabriel Peyr醇
Low-Rank Adaptation (LoRA) is the most widely adopted method for fine-tuning large language models. Notably, LoRA is inherently overparameterized: multiple pairs of low-rank factors can yield the same adapted weight matrix. We show—both theoretically and empirically—that these pairs exhibit significantly different condition numbers. As a result, converging to different loss minimizers directly impacts the convergence rate of LoRA. Building on this observation, we introduce Balanced Low-Rank Adaptation (BaLoRA), a variant of LoRA that projects iterates onto a balanced manifold. This manifold improves the conditioning of the loss landscape while preserving the adapted matrix. The projection step is computationally lightweight and integrates seamlessly into existing fine-tuning pipelines. Empirically, BaLoRA converges faster than standard LoRA and achieves superior performance across a range of fine-tuning tasks.
Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection
June 18, 2024research area Human-Computer Interaction, research area Speech and Natural Language Processingconference Interspeech
Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low…
Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks
July 9, 2021research area Computer Vision, research area Methods and AlgorithmsWorkshop at ICML
This paper was accepted at the workshop on Overparameterization: Pitfalls and Opportunities at the ICML 2021 conference.
Deep linear networks trained with gradient descent yield low rank solutions, as is typically studied in matrix factorization. In this paper, we take a step further and analyze implicit rank regularization in autoencoders. We show greedy learning of low-rank latent codes induced by a linear sub-network at the autoencoder…