Two-Layer Bandit Optimization for Recommendations

AuthorsSiyong Ma, Puja Das, Sofia Maria Nikolakaki, Qifeng Chen, Humeyra Topcu Altintas

Online commercial app marketplaces serve millions of apps to billions of users in an efficient manner. Bandit optimization algorithms are used to ensure that the recommendations are relevant, and converge to the best performing content over time. However, directly applying bandits to real-world systems, where the catalog of items is dynamic and continuously refreshed, is not straightforward. One of the challenges we face is the existence of several competing content surfacing components, a phenomenon not unusual in large-scale recommender systems. This often leads to challenging scenarios, where improving the recommendations in one component can lead to performance degradation of another, i.e., “cannibalization". To address this problem we introduce an efficient two-layer bandit approach which is contextualized to user cohorts of similar taste. We mitigate cannibalization at runtime within a single multi-intent content surfacing platform by formalizing relevant offline evaluation metrics, and by involving the cross-component interactions in the bandit rewards. The user engagement in our proposed system has more than doubled as measured by online A/B testings.

Two-Layer Bandit Optimization for Recommendations

Related readings and updates.

Faster Rates for Private Adversarial Bandits

Identifying Controversial Pairs in Item-to-Item Recommendations

Discover opportunities in Machine Learning.