# Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps

AuthorsMarco Cuturi, Michal Klein, Pierre Ablin

content type paperpublished June 2023

AuthorsMarco Cuturi, Michal Klein, Pierre Ablin

Optimal transport (OT) theory focuses, among all maps $T:\mathbb{R}^d\rightarrow \mathbb{R}^d$ that can morph a probability measure onto another, on those that are the "thriftiest", i.e. such that the averaged cost $c(\mathbf{x}, T(\mathbf{x}))$ between $\mathbf{x}$ and its image $T(\mathbf{x})$ be as small as possible. Many computational approaches have been proposed to estimate such *Monge* maps when $c$ is the $\ell_2^2$ distance, e.g., using entropic maps (Pooladian and Niles-Weed, 2021), or neural networks (Makkuva et al., 2020;
Korotin et al., 2020). We propose a new model for transport maps, built on a family of translation invariant costs $c(\mathbf{x},\mathbf{y}):=h(\mathbf{x}-\mathbf{y})$, where $h:=\tfrac{1}{2}\|\cdot\|_2^2+\tau$ and $\tau$ is a regularizer. We propose a generalization of the entropic map suitable for $h$, and highlight a surprising link tying it with the *Bregman* centroids of the divergence $D_h$ generated by $h$, and the proximal operator of $\tau$. We show that choosing a sparsity-inducing norm for $\tau$ results in maps that apply *Occam*'s razor to transport, in the sense that the *displacement* vectors $\Delta(\mathbf{x}):= T(\mathbf{x})-\mathbf{x}$ they induce are sparse, with a sparsity pattern that varies depending on $\mathbf{x}$. We showcase the ability of our method to estimate meaningful OT maps for high-dimensional single-cell transcription data, in the $34000$-$d$ space of gene counts for cells, *without* using dimensionality reduction, thus retaining the ability to interpret all displacements at the gene level.

Optimal transport (OT) theory has been been used in machine learning to study and characterize maps that can push-forward efficiently a probability measure onto another.
Recent works have drawn inspiration from Brenier's theorem, which states that when the ground cost is the squared-Euclidean distance, the "best" map to morph a continuous measure in P(Rd)\mathcal{P}(\mathbb{R}^d)P(Rd) into another must be the gradient of a convex function.
To…

See paper detailsPeople have an innate capability to understand the 3D visual world and make predictions about how the world could look from different points of view, even when relying on few visual observations. We have this spatial reasoning ability because of the rich mental models of the visual world we develop over time. These mental models can be interpreted as a prior belief over which configurations of the visual world are most likely to be observed. In this case, a prior is a probability distribution over the 3D visual world.