View publication

Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.

Related readings and updates.

Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

*= Equal Contribution When deploying deep learning models to a device, it is traditionally assumed that available computational resources (compute, memory, and power) remain static. However, real-world computing systems do not always provide stable resource guarantees. Computational resources need to be conserved when load from other processes is high or battery power is low. Inspired by recent works on neural network subspaces, we propose a…
See paper details

Improving Neural Network Subspaces

In spite of the success of deep learning, we know relatively little about the many possible solutions to which a trained network can converge. Networks generally converge to some local minima—a region in space where the loss function increases in every direction—of their loss function during training. Our research explores why local minima outperforms others when a trained network is evaluated on a held-out test set.

See article details