Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution
authors Hadi Pouransari, Mojan Javaheripi, Vinay Sharma, Oncel Tuzel
Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed student model suffers from an accuracy gap with its teacher. We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher output distributions; (2) sampling examples from an approximation to the underlying data distribution; and (3) matching student and teacher output distributions over this extended set including uncertain samples. We conduct rigorous evaluations on regression and classification tasks and show that compared to the standard knowledge distillation, extracurricular learning reduces the gap by 46% to 68%. This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures: 16% regression error reduction on the MPIIGaze dataset, +3.4% to +9.1% improvement in top-1 classification accuracy on the CIFAR100 dataset, and +2.9% top-1 improvement on the ImageNet dataset.
Deep neural networks are a milestone technique in the advancement of modern machine perception systems. However, in spite of the exceptional learning capacity and improved generalizability, these neural models still suffer from poor transferability. This is the challenge of domain shift—a shift in the relationship between data collected across different domains (e.g., computer generated vs. captured by real cameras). Models trained on data collected in one domain generally have poor accuracy on other domains. In this article, we discuss a new domain adaptation process that takes advantage of task-specific decision boundaries and the Wasserstein metric to bridge the domain gap, allowing the effective transfer of knowledge from one domain to another. As an additional advantage, this process is completely unsupervised, i.e., there is no need for new domain data to have labels or annotations.