View publication

Optimal transport (OT) theory has been been used in machine learning to study and characterize maps that can push-forward efficiently a probability measure onto another. Recent works have drawn inspiration from Brenier's theorem, which states that when the ground cost is the squared-Euclidean distance, the "best" map to morph a continuous measure in P(Rd)\mathcal{P}(\mathbb{R}^d) into another must be the gradient of a convex function. To exploit that result, , Makkuva et al. (2020); Korotin et al. (2020) consider maps T=fθT=\nabla f_\theta, where fθf_\theta is an input convex neural network (ICNN), as defined by Amos et al. 2017, and fit θ\theta with SGD using samples.

Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges, due notably to the many constraints imposed on θ\theta; the need to approximate the conjugate of fθf_\theta; or the limitation that they only work for the squared-Euclidean cost. More generally, we question the relevance of using Brenier’s result, which only applies to densities, to constrain the architecture of candidate maps fitted on samples. Motivated by these limitations, we propose a radically different approach to estimating OT maps:

Given a cost cc and a reference measure ρ\rho, we introduce a regularizer, the Monge gap Mρc(T)\mathcal{M}^c_{\rho}(T) of a map TT. That gap quantifies how far a map TT deviates from the ideal properties we expect from a cc-OT map. In practice, we drop all architecture requirements for TT and simply minimize a distance (e.g., the Sinkhorn divergence) between TμT\sharp\mu and ν\nu, regularized by Mρc(T)\mathcal{M}^c_\rho(T). We study Mρc\mathcal{M}^c_{\rho}, and show how our simple pipeline outperforms significantly other baselines in practice.

Related readings and updates.

Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps

Optimal transport (OT) theory focuses, among all maps T:Rd→RdT:\mathbb{R}^d\rightarrow \mathbb{R}^dT:Rd→Rd that can morph a probability measure onto another, on those that are the "thriftiest", i.e. such that the averaged cost c(x,T(x))c(\mathbf{x}, T(\mathbf{x}))c(x,T(x)) between x\mathbf{x}x and its image T(x)T(\mathbf{x})T(x) be as small as possible. Many computational approaches have been proposed to estimate such Monge maps when ccc is the…
See paper details

Supervised Training of Conditional Monge Maps

Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. That theory has been mostly used to estimate, given a pair of source and target probability measures (μ,ν)(\mu,\nu)(μ,ν), a parameterized map TθT_\thetaTθ​ that can efficiently map μ\muμ onto ν\nuν. In many applications, such as predicting cell responses to treatments, the…
See paper details