View publication

While there exists a large amount of literature on the general challenges and best practices for trustworthy online A/B testing, there are limited studies on sample size estimation, which plays a crucial role in trustworthy and efficient A/B testing that ensures the resulting inference has a sufficient power and type I error control. For example, when the sample size is under-estimated the statistical inference, even with the correct analysis methods, will not be able to detect the true significant improvement leading to misinformed and costly decisions. This paper addresses this fundamental gap by developing new sample size calculation methods for correlated data, as well as absolute versus relative treatment effects, both ubiquitous in online experiments. Additionally, we address a practical question of the minimal observed difference that will be statistically significant and how it relates to average treatment effect and sample size calculation. All proposed methods are accompanied by mathematical proofs, illustrative examples, and simulations. We end by sharing some best practices on various practical topics on sample size calculation and experimental design.

Related readings and updates.

On Efficient and Statistical Quality Estimation for Data Annotation

Annotated data is an essential ingredient to train, evaluate, compare and productionalize machine learning models. It is therefore imperative that annotations are of high quality. For their creation, good quality management and thereby reliable quality estimates are needed. Then, if quality is insufficient during the annotation process, rectifying measures can be taken to improve it. For instance, project managers can use quality estimates to…
See paper details

Privacy-Preserving Quantile Treatment Effect Estimation for Randomized Controlled Trials

In accordance with the principle of "data minimization," many internet companies are opting to record less data. However, this is often at odds with A/B testing efficacy. For experiments with units with multiple observations, one popular data-minimizing technique is to aggregate data for each unit. However, exact quantile estimation requires the full observation-level data. In this paper, we develop a method for approximate Quantile Treatment…
See paper details