Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices

AuthorsCongzheng Song, Xinyu Tang

Fine-tuning large language models (LLMs) with backpropagation — even for a subset of parameters such as LoRA — can be much more memory-consuming than inference and is often deemed impractical for resource-constrained mobile devices. Alternative methods, such as zeroth-order optimization (ZO), can greatly reduce the memory footprint but come at the cost of significantly slower model convergence (10× to 100× more steps than backpropagation). We propose a memory-efficient implementation of backpropagation (MeBP) on mobile devices that provides better trade-off between memory usage and compute time, while converging faster and achieving better performance than the ZO baseline. We verify the effectiveness of MeBP on an iPhone 15 Pro Max and show that various LLMs, ranging from 0.5B to 4B parameters, can be fine-tuned using less than 1GB of memory.

Related readings and updates.

Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge

January 9, 2026research area Knowledge Bases and Search, research area Methods and Algorithmsconference ICLR

The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a fraction is used per prompt, and impractical for edge devices with limited inference-time memory and compute. We address this shortcoming by a memory-augmented architecture and a pretraining strategy aligned with…

MobileOne: An Improved One millisecond Mobile Backbone

March 24, 2023research area Computer Vision, research area Methods and Algorithmsconference CVPR

Efficient neural network backbones for mobile devices are often optimized for metrics such as FLOPs or parameter count. However, these metrics may not correlate well with latency of the network when deployed on a mobile device. Therefore, we perform extensive analysis of different metrics by deploying several mobile-friendly networks on a mobile device. We identify and analyze architectural and optimization bottlenecks in recent efficient neural…

Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices

Related readings and updates.

Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge

MobileOne: An Improved One millisecond Mobile Backbone

Discover opportunities in Machine Learning.