Differentially Private Heavy Hitter Detection using Federated Analytics

In collaboration with Stanford University

AuthorsKaran Chadha, Junye Chen, John Duchi, Vitaly Feldman, Hanieh Hashemi, Omid Javidbakht, Audra McMillan, and Kunal Talwar

View publication

This work was accepted at the Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities workshop at ICML 2023.

In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning algorithm that improves the performance of the algorithm while satisfying computational, communication and privacy constraints. We explore the impact of different data-selection schemes as well as the impact of introducing deny lists during multiple runs of the algorithm. We test these improvements using extensive experimentation on the Reddit dataset [Caldas et al., 2018] on the task of learning the most frequent words.

Related readings and updates.

June 29, 2022research area General

Earlier this year, Apple hosted the Privacy-Preserving Machine Learning (PPML) workshop. This virtual event brought Apple and members of the academic research communities together to discuss the state of the art in the field of privacy-preserving machine learning through a series of talks and discussions over two days.

December 6, 2017research area Privacy

Understanding how people use their devices often helps in improving the user experience. However, accessing the data that provides such insights — for example, what users type on their keyboards and the websites they visit — can compromise user privacy. We develop a system architecture that enables learning at scale by leveraging local differential privacy, combined with existing privacy best practices. We design efficient and scalable local differentially private algorithms and provide rigorous analyses to demonstrate the tradeoffs among utility, privacy, server computation, and device bandwidth. Understanding the balance among these factors leads us to a successful practical deployment using local differential privacy. This deployment scales to hundreds of millions of users across a variety of use cases, such as identifying popular emojis, popular health data types, and media playback preferences in Safari. We provide additional details about our system in the full version.

Differentially Private Heavy Hitter Detection using Federated Analytics

Related readings and updates.

Apple Privacy-Preserving Machine Learning Workshop 2022

Learning with Privacy at Scale

Discover opportunities in Machine Learning.