Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices
AuthorsCongzheng Song, Xinyu Tang
Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices
AuthorsCongzheng Song, Xinyu Tang
Fine-tuning large language models (LLMs) with backpropagation — even for a subset of parameters such as LoRA — can be much more memory-consuming than inference and is often deemed impractical for resource-constrained mobile devices. Alternative methods, such as zeroth-order optimization (ZO), can greatly reduce the memory footprint but come at the cost of significantly slower model convergence (10× to 100× more steps than backpropagation). We propose a memory-efficient implementation of backpropagation (MeBP) on mobile devices that provides better trade-off between memory usage and compute time, while converging faster and achieving better performance than the ZO baseline. We verify the effectiveness of MeBP on an iPhone 15 Pro Max and show that various LLMs, ranging from 0.5B to 4B parameters, can be fine-tuned using less than 1GB of memory.
Proxy-FDA: Proxy-Based Feature Distribution Alignment for Fine-Tuning Vision Foundation Models Without Forgetting
June 5, 2025research area Computer Vision, research area Methods and Algorithmsconference ICML
Vision foundation models pre-trained on massive data encode rich representations of real-world concepts, which can be adapted to downstream tasks by fine-tuning. However, fine-tuning foundation models on one task often leads to the issue of concept forgetting on other tasks. Recent methods of robust fine-tuning aim to mitigate forgetting of prior knowledge without affecting the fine-tuning performance. Knowledge is often preserved by matching the…
MobileOne: An Improved One millisecond Mobile Backbone
March 24, 2023research area Computer Vision, research area Methods and Algorithmsconference CVPR
Efficient neural network backbones for mobile devices are often optimized for metrics such as FLOPs or parameter count. However, these metrics may not correlate well with latency of the network when deployed on a mobile device. Therefore, we perform extensive analysis of different metrics by deploying several mobile-friendly networks on a mobile device. We identify and analyze architectural and optimization bottlenecks in recent efficient neural…