Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices
AuthorsCongzheng Song, Xinyu Tang
Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices
AuthorsCongzheng Song, Xinyu Tang
Fine-tuning large language models (LLMs) with backpropagation — even for a subset of parameters such as LoRA — can be much more memory-consuming than inference and is often deemed impractical for resource-constrained mobile devices. Alternative methods, such as zeroth-order optimization (ZO), can greatly reduce the memory footprint but come at the cost of significantly slower model convergence (10× to 100× more steps than backpropagation). We propose a memory-efficient implementation of backpropagation (MeBP) on mobile devices that provides better trade-off between memory usage and compute time, while converging faster and achieving better performance than the ZO baseline. We verify the effectiveness of MeBP on an iPhone 15 Pro Max and show that various LLMs, ranging from 0.5B to 4B parameters, can be fine-tuned using less than 1GB of memory.
Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge
January 9, 2026research area Knowledge Bases and Search, research area Methods and Algorithms
The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a fraction is used per prompt, and impractical for edge devices with limited inference-time memory and compute. We address this shortcoming by a memory-augmented architecture and a pretraining strategy aligned with…
MobileOne: An Improved One millisecond Mobile Backbone
March 24, 2023research area Computer Vision, research area Methods and Algorithmsconference CVPR
Efficient neural network backbones for mobile devices are often optimized for metrics such as FLOPs or parameter count. However, these metrics may not correlate well with latency of the network when deployed on a mobile device. Therefore, we perform extensive analysis of different metrics by deploying several mobile-friendly networks on a mobile device. We identify and analyze architectural and optimization bottlenecks in recent efficient neural…