videoAugust 29, 2024

Apple Workshop on Privacy-Preserving Machine Learning: Matrix Factorization DP-FTRL outperforms DP-SGD for cross-device federated learning and centralized training

AuthorsBrand McMahan (Google)

Related readings and updates.

Metric-Dependent Annotation Saturation for Learning from Label Distributions

June 23, 2026research area Data Science and Annotation, research area Speech and Natural Language Processing

When annotators disagree on a label, the disagreement itself carries signal—and the number of annotators needed to capture it depends on the evaluation metric. We fine-tune NLI models on label distributions subsampled from ChaosNLI, a dataset providing 100 independent annotator judgments per item, and identify metric-dependent saturation. In our 3-class NLI setting, entropy correlation—whether the model identifies which items elicit…

Nine Judges, Two Effective Votes: Correlated Errors Undermine LLM Evaluation Panels

June 23, 2026research area Data Science and Annotation, research area Speech and Natural Language Processing

LLM-as-a-judge panels aggregate votes from multiple models, with the expectation that diverse models yield more reliable evaluations. We develop a framework to measure the true informational value of such panels and quantify how far their reliability falls short of the independent-voting ideal. Testing a panel of 9 frontier LLMs from 7 model families on three natural language inference datasets (each with 100 human annotations per item), we find…

Apple Workshop on Privacy-Preserving Machine Learning: Matrix Factorization DP-FTRL outperforms DP-SGD for cross-device federated learning and centralized training

Related readings and updates.

Metric-Dependent Annotation Saturation for Learning from Label Distributions

Nine Judges, Two Effective Votes: Correlated Errors Undermine LLM Evaluation Panels

Discover opportunities in Machine Learning.