View publication

We study the problem of differentially private stochastic convex optimization (DP-SCO) with heavy-tailed gradients, where we assume a kthk^{\text{th}}-moment bound on the Lipschitz constants of sample functions, rather than a uniform bound. We propose a new reduction-based approach that enables us to obtain the first optimal rates (up to logarithmic factors) in the heavy-tailed setting, achieving error G21n+Gk(dnε)11kG_2 \cdot \frac 1 {\sqrt n} + G_k \cdot (\frac{\sqrt d}{n\varepsilon})^{1 - \frac 1 k} under (ε,δ)(\varepsilon, \delta)-approximate differential privacy, up to a mild polylog(lognδ)\textup{polylog}(\frac{\log n}{\delta}) factor, where G22G_2^2 and GkkG_k^k are the 2nd2^{\text{nd}} and kthk^{\text{th}} moment bounds on sample Lipschitz constants, nearly-matching a lower bound of (Lowy et al. 2023).

Related readings and updates.

How Smooth Is Attention?

Self-attention and masked self-attention are at the heart of Transformers' outstanding success. Still, our mathematical understanding of attention, in particular of its Lipschitz properties — which are key when it comes to analyzing robustness and expressive power — is incomplete. We provide a detailed study of the Lipschitz constant of self-attention in several practical scenarios, discussing the impact of the sequence length and layer…
See paper details

Private Stochastic Convex Optimization: Optimal Rates in ℓ1 Geometry

Stochastic convex optimization over an ℓ1ℓ_1ℓ1​-bounded domain is ubiquitous in machine learning applications such as LASSO but remains poorly understood when learning with differential privacy. We show that, up to logarithmic factors the optimal excess population loss of any (ε,δ)(\varepsilon, \delta)(ε,δ)-differentially private optimizer is log⁡(d)/n  +\sqrt{\log(d)/n}\; +log(d)/n​+ d/εn.\sqrt{d}/\varepsilon n.d​/εn. The upper bound is based on…
See paper details