Adaptive Weight Decay
AuthorsAmin Ghiasi, Ali Shafahi, Reza Ardekani
We propose adaptive weight decay, which automatically tunes the hyper-parameter for weight decay during each training iteration. For classification problems, we propose changing the value of the weight decay hyper-parameter on the fly based on the strength of updates from the classification loss (i.e., gradient of cross-entropy), and the regularization loss (i.e.,
In spite of the success of deep learning, we know relatively little about the many possible solutions to which a trained network can converge. Networks generally converge to some local minima—a region in space where the loss function increases in every direction—of their loss function during training. Our research explores why local minima outperforms others when a trained network is evaluated on a held-out test set.