View publication

We design new differentially private algorithms for the problems of adversarial bandits and bandits with expert advice. For adversarial bandits, we give a simple and efficient conversion of any non-private bandit algorithms to private bandit algorithms. Instantiating our conversion with existing non-private bandit algorithms gives a regret upper bound of O(KTε)O\left(\frac{\sqrt{KT}}{\sqrt{\varepsilon}}\right), improving upon the existing upper bound O(KTlog(KT)ε)O\left(\frac{\sqrt{KT \log(KT)}}{\varepsilon}\right) in all privacy regimes. In particular, our algorithms allow for sublinear expected regret even when ε1T\varepsilon \leq \frac{1}{\sqrt{T}}, establishing the first known separation between central and local differential privacy. For bandits with expert advice, we give the first differentially private algorithms, with expected regret O(NTε),O(KTlog(N)log(KT)ε)O\left(\frac{\sqrt{NT}}{\sqrt{\varepsilon}}\right), O\left(\frac{\sqrt{KT\log(N)}\log(KT)}{\varepsilon}\right), and O~(N1/6K1/2T2/3log(NT)ε1/3+N1/2log(NT)ε)\tilde{O}\left(\frac{N^{1/6}K^{1/2}T^{2/3}\log(NT)}{\varepsilon^{1/3}} + \frac{N^{1/2}\log(NT)}{\varepsilon}\right), where KK and NN denote the number of actions and experts respectively. These rates allow us to get sublinear regret for different combinations of small and large KK, NN and ε\varepsilon.

Related readings and updates.

We design differentially private algorithms for the problem of prediction with expert advice under dynamic regret, also known as tracking the best expert. Our work addresses three natural types of adversaries, stochastic with shifting distributions, oblivious, and adaptive, and designs algorithms with sub-linear regret for all three cases. In particular, under a shifting stochastic adversary where the distribution may shift SSS times, we provide…
Read more
*= Equal Contributors Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of O(Tlog⁡d+log⁡d/ε)O(\sqrt{T \log d} + \log…
Read more