View publication

The privacy risk has become an emerging challenge in both information theory and computer science due to the massive (centralized) collection of user data. In this paper, we overview privacy-preserving mechanisms and metrics from the lenses of information theory, and unify different privacy metrics, including f-divergences, Renyi divergences, and differential privacy, by the probability likelihood ratio (and the logarithm of it). We introduce recent progresses on designing privacy-preserving mechanisms according to the privacy metrics in (i) computer science, where differential privacy is the standard privacy notion that controls the output shift given small input perturbation, and (ii) information theory, where the privacy is guaranteed by minimizing information leakage. In particular, for differential privacy, we include its important variants (e.g., Renyi differential privacy, Pufferfish privacy) and properties, discuss its connections with information-theoretic quantities, and provide the operational interpretations of its additive noise mechanisms. For information-theoretic privacy, we cover notable frameworks from privacy funnel, originated from rate-distortion theory and information bottleneck, to privacy guarantee against statistical inference/guessing, and information obfuscation on samples and features. Finally, we discuss the implementations of these privacy-preserving mechanisms in current data-driven machine learning scenarios, including deep learning, information obfuscation, federated learning, and dataset sharing.

Related readings and updates.

Apple Privacy-Preserving Machine Learning Workshop 2022

Earlier this year, Apple hosted the Privacy-Preserving Machine Learning (PPML) workshop. This virtual event brought Apple and members of the academic research communities together to discuss the state of the art in the field of privacy-preserving machine learning through a series of talks and discussions over two days.

See event details

Individual Privacy Accounting via a Renyi Filter

We consider a sequential setting in which a single dataset of individuals is used to perform adaptively-chosen analyses, while ensuring that the differential privacy loss of each participant does not exceed a pre-specified privacy budget. The standard approach to this problem relies on bounding a worst-case estimate of the privacy loss over all individuals and all possible values of their data, for every single analysis. Yet, in many scenarios…
See paper details