View publication

In large-scale statistical learning, data collection and model fitting are moving increasingly toward peripheral devices—phones, watches, fitness trackers—away from centralized data collection. Concomitant with this rise in decentralized data are increasing challenges of maintaining privacy while allowing enough information to fit accurate, useful statistical models. This motivates local notions of privacy—most significantly, local differential privacy, which provides strong protections against sensitive data disclosures—where data is obfuscated before a statistician or learner can even observe it, providing strong protections to individuals' data. Yet local privacy as traditionally employed may prove too stringent for practical use, especially in modern high-dimensional statistical and machine learning problems. Consequently, we revisit the types of disclosures and adversaries against which we provide protections, considering adversaries with limited prior information and ensuring that with high probability, ensuring they cannot reconstruct an individual's data within useful tolerances. By reconceptualizing these protections, we allow more useful data release—large privacy parameters in local differential privacy—and we design new (minimax) optimal locally differentially private mechanisms for statistical learning problems for all privacy levels. We thus present practicable approaches to large-scale locally private model training that were previously impossible, showing theoretically and empirically that we can fit large-scale image classification and language models with little degradation in utility.

Related readings and updates.

A Survey on Privacy from Statistical, Information and Estimation-Theoretic Views

The privacy risk has become an emerging challenge in both information theory and computer science due to the massive (centralized) collection of user data. In this paper, we overview privacy-preserving mechanisms and metrics from the lenses of information theory, and unify different privacy metrics, including f-divergences, Renyi divergences, and differential privacy, by the probability likelihood ratio (and the logarithm of it). We introduce…
See paper details

Learning with Privacy at Scale

Understanding how people use their devices often helps in improving the user experience. However, accessing the data that provides such insights — for example, what users type on their keyboards and the websites they visit — can compromise user privacy. We develop a system architecture that enables learning at scale by leveraging local differential privacy, combined with existing privacy best practices. We design efficient and scalable local differentially private algorithms and provide rigorous analyses to demonstrate the tradeoffs among utility, privacy, server computation, and device bandwidth. Understanding the balance among these factors leads us to a successful practical deployment using local differential privacy. This deployment scales to hundreds of millions of users across a variety of use cases, such as identifying popular emojis, popular health data types, and media playback preferences in Safari. We provide additional details about our system in the full version.

See article details