# Contrasting Multiple Representations with the Multi-Marginal Matching Gap

AuthorsZoe Piran, Michal Klein, James Thornton, Marco Cuturi Cameto

content type paperpublished July 2024

AuthorsZoe Piran, Michal Klein, James Thornton, Marco Cuturi Cameto

Learning meaningful representations of complex objects that can be seen through multiple ($k\geq 3$) views or modalities is a core task in machine learning. Existing methods use losses originally intended for paired views, and extend them to $k$ views, either by instantiating $\tfrac12k(k-1)$ loss-pairs, or by using reduced embeddings, following a $\textit{one vs. average-of-rest}$ strategy. We propose the multi-marginal matching gap (M3G), a loss that borrows tools from multi-marginal optimal transport (MM-OT) theory to simultaneously incorporate all $k$ views. Given a batch of $n$ points, each seen as a $k$-tuple of views subsequently transformed into $k$ embeddings, our loss contrasts the cost of matching these $n$ ground-truth $k$-tuples with the MM-OT polymatching cost, which seeks $n$ optimally arranged $k$-tuples chosen within these $n\times k$ vectors. While the exponential complexity $O(n^k$) of the MM-OT problem may seem daunting, we show in experiments that a suitable generalization of the Sinkhorn algorithm for that problem can scale to, e.g., $k=3\sim 6$ views using mini-batches of size $64~\sim128$. Our experiments demonstrate improved performance over multiview extensions of pairwise losses, for both self-supervised and multimodal tasks.

Consider a multi-class labelling problem, where the labels can take values in [k], and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in k? Prior notions of calibration exhibit a tradeoff between computational efficiency…

See paper detailsIn this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. For a universe size of kkk and with nnn users, our ε\varepsilonε-LDP algorithm has communication cost ⌈log2k⌉\lceil\log_2k\rceil⌈log2k⌉ bits in the private coin setting and εlog2e+O(1)\varepsilon\log_2 e + O(1)εlog2e+O(1) in the public coin setting, and has computation cost O(n+kexp(ε)logk)O(n +…

See paper details