DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective

AuthorsHyung Gun Chi, Zakaria Aldeneh, Tatiana Likhomanenko, Oggi Rudovic, Takuya Higuchi, Li-Wei Chen†, Shinji Watanabe†, Ahmed Hussen Abdelaziz‡**

View publication

We introduce DiceHuBERT, a knowledge distillation framework for compressing HuBERT, a widely used self-supervised learning (SSL)-based speech foundation model. Unlike existing distillation methods that rely on layer-wise and feature-wise mapping between teacher and student models, DiceHuBERT leverages HuBERT’s iterative self-distillation mechanism by directly replacing the original model with a student model. This replacement allows the student to be trained using the same SSL objective used when pre-training HuBERT, eliminating the need for additional modules or architectural constraints. Experimental results on SUPERB show that DiceHuBERT consistently outperforms existing distillation methods, improving phoneme recognition performance by over 21% and ASR performance by more than 14%. Furthermore, DiceHuBERT demonstrates competitive performance across multiple tasks, highlighting its clear advantage.

† Carnegie Mellon University
‡ Meta
** Work done while at Apple

DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective

Related readings and updates.

Distillation Scaling Laws

Homomorphic Self-Supervised Learning

Discover opportunities in Machine Learning.