paperDecember 2025

The Communication Complexity of Distributed Estimation

AuthorsParikshit Gopalan, Raghu Meka†, Prasad Raghavendra‡, Mihir Singhal‡, Avi Wigderson§

We study an extension of the standard two-party communication model in which Alice and Bob hold probability distributions $p$ and $q$ over domains $X$ and $Y$ , respectively. Their goal is to estimate

$\mathbb{E}_{x \sim p, y \sim q}[f(x, y)]$

to within additive error $\varepsilon$ for a bounded function $f$ , known to both parties. We refer to this as the distributed estimation problem. Special cases of this problem arise in a variety of areas including sketching, databases and learning. Our goal is to understand how the required communication scales with the communication complexity of $f$ and the error parameter $\varepsilon$ .

The random sampling approach — estimating the mean by averaging over $O(1/\varepsilon^2)$ random samples — requires $O(R(f)/\varepsilon^2)$ total communication, where $R(f)$ is the randomized communication complexity of $f$ . We design a new debiasing protocol which improves the dependence on $1/\varepsilon$ to be linear instead of quadratic. Additionally we show better upper bounds for several special classes of functions, including the Equality and Greater-than functions. We introduce lower bound techniques based on spectral methods and discrepancy, and show the optimality of many of our protocols: the debiasing protocol is tight for general functions, and that our protocols for the equality and greater-than functions are also optimal. Furthermore, we show that among full-rank Boolean functions, Equality is essentially the easiest.

† University of California, Los Angeles
‡ University of California, Berkeley
§ Institute for Advanced Study (IAS)

The Communication Complexity of Distributed Estimation

Related readings and updates.

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Lower Bounds for Locally Private Estimation via Communication Complexity

Discover opportunities in Machine Learning.