paperAugust 2021

Smooth Sequential Optimization with Delayed Feedback

AuthorsSrivas Chennu, Jamie Martin, Puli Liyanagama, Phil Mohr

This paper was accepted at the workshop on Bayesian Causal Inference for Real World Interactive Systems at the KDD 2021 conference.

Stochastic delays in feedback lead to unstable sequential learning using multi-armed bandits. Recently, empirical Bayesian shrinkage has been shown to improve reward estimation in bandit learning. Here, we propose a novel adaptation to shrinkage that estimates smoothed reward estimates from windowed cumulative inputs, to deal with incomplete knowledge from delayed feedback and non-stationary rewards. Using numerical simulations, we show that this adaptation retains the benefits of shrinkage, and improves the stability of reward estimation by more than 50%. Our proposal reduces variability in treatment allocations to the best arm by up to 3.8x, and improves statistical accuracy - with up to 8% improvement in true positive rates and 37% reduction in false positive rates. Together, these advantages enable control of the trade-off between speed and stability of adaptation, and facilitate human-in-the-loop sequential optimization.

Related readings and updates.

May 3, 2024research area Data Science and Annotation, research area Methods and Algorithmsconference ICLR

Most bandit algorithms assume that the reward variances or their upper bounds are known, and that they are the same for all arms. This naturally leads to suboptimal performance and higher regret due to variance overestimation. On the other hand, underestimated reward variances may lead to linear regret due to committing early to a suboptimal arm. This motivated prior works on variance-adaptive frequentist algorithms, which have strong...

December 7, 2021research area Data Science and Annotation, research area Methods and Algorithms

Providing new features—while preserving user privacy—requires techniques for learning from private and anonymized user feedback. To learn quickly and accurately, we develop and employ statistical learning algorithms that help us overcome multiple challenges that arise from sampling noise, applications of differential privacy, and delays that may be present in the data. These algorithms enable teams at Apple to measure and understand which user experiences are the best. This understanding leads to continual improvements across Apple's products and services to drive better experiences. We provide aspects of this understanding to the Apple developer community through features such as product page optimization.

Smooth Sequential Optimization with Delayed Feedback

Related readings and updates.

Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling

Interpretable Adaptive Optimization

Discover opportunities in Machine Learning.