Metric-Dependent Annotation Saturation for Learning from Label Distributions
AuthorsGuneet Kohli
Metric-Dependent Annotation Saturation for Learning from Label Distributions
AuthorsGuneet Kohli
When annotators disagree on a label, the disagreement itself carries signal—and the number of annotators needed to capture it depends on the evaluation metric. We fine-tune NLI models on label distributions subsampled from ChaosNLI, a dataset providing 100 independent annotator judgments per item, and identify metric-dependent saturation. In our 3-class NLI setting, entropy correlation—whether the model identifies which items elicit disagreement—requires N ≈ 20–50 annotators to converge, while distributional match (KL divergence) saturates by N ≈ 10 (87–95% of improvement across five model seeds). This finding rests on a prior observation: soft labels carry item-specific signal that label smoothing cannot replicate. Across five smoothing intensities, entropy correlation clusters at r ≈ 0.45–0.49, while soft labels reach r = 0.643 (p < 0.001); per-item analysis traces this gap to smoothing’s inability to distinguish ambiguous items from clear ones. The soft-label advantage replicates across two architectures (DeBERTa, RoBERTa), a non-NLI-pretrained baseline, and an exploratory cross-domain evaluation on content safety. These results suggest that annotation budgets should be informed by the target evaluation metric rather than set uniformly.
Continuous Soft Pseudo-Labeling in ASR
November 15, 2022research area Methods and Algorithms, research area Speech and Natural Language ProcessingWorkshop at NeurIPS
This paper was accepted at the workshop “I Can’t Believe It’s Not Better: Understanding Deep Learning Through Empirical Falsification”
Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition. In contrast with earlier strategies that alternated between training a model and generating pseudo-labels (PLs) with it, here PLs are generated in end-to-end…
One-hot labels do not represent soft decision boundaries among concepts, and hence, models trained on them are prone to overfitting. Using soft labels as targets provide regularization, but different soft labels might be optimal at different stages of optimization. Also, training with fixed labels in the presence of noisy annotations leads to worse generalization. To address these limitations, we propose a framework, where we treat the labels as…