The privacy risk has become an emerging challenge in both information theory and computer science due to the massive (centralized) collection of user data. In this paper, we overview privacy-preserving mechanisms and metrics from the lenses of information theory, and unify different privacy metrics, including f-divergences, Renyi divergences, and differential privacy, by the probability likelihood ratio (and the logarithm of it). We introduce recent progresses on designing privacy-preserving mechanisms according to the privacy metrics in (i) computer science, where differential privacy is the standard privacy notion that controls the output shift given small input perturbation, and (ii) information theory, where the privacy is guaranteed by minimizing information leakage. In particular, for differential privacy, we include its important variants (e.g., Renyi differential privacy, Pufferfish privacy) and properties, discuss its connections with information-theoretic quantities, and provide the operational interpretations of its additive noise mechanisms. For information-theoretic privacy, we cover notable frameworks from privacy funnel, originated from rate-distortion theory and information bottleneck, to privacy guarantee against statistical inference/guessing, and information obfuscation on samples and features. Finally, we discuss the implementations of these privacy-preserving mechanisms in current data-driven machine learning scenarios, including deep learning, information obfuscation, federated learning, and dataset sharing.
Related readings and updates.
Earlier this year, Apple hosted the Workshop on Privacy-Preserving Machine Learning (PPML). This virtual event brought Apple and members of the academic research communities together to discuss the state of the art in the field of privacy-preserving machine learning through a series of talks and discussions over two days.