View publication

Flow models parameterized as time-dependent velocity fields can generate data from noise by integrating an ODE. These models are often trained using flow matching, i.e. by sampling random pairs of noise and target points (x0,x1)(\mathbf{x}_0, \mathbf{x}_1) and ensuring that the velocity field is aligned, on average, with x1x0\mathbf{x}_1 - \mathbf{x}_0 when evaluated along a segment linking x0\mathbf{x}_0 to x1\mathbf{x}_1. While these pairs are sampled independently by default, they can also be selected more carefully by matching batches of nn noise to nn target points using an optimal transport (OT) solver. Although promising in theory, the OT flow matching (OT-FM) approach is not widely used in practice. Zhang et al. (2025) pointed out recently that OT-FM truly starts paying off when the batch size nn grows significantly, which only a multi-GPU implementation of the Sinkhorn algorithm can handle. Unfortunately, the costs of running Sinkhorn can quickly balloon, requiring O(n2/ε2)O(n^2/\varepsilon^2) operations for every nn pairs used to fit the velocity field, where ε\varepsilon is a regularization parameter that should be typically small to yield better results. To fulfill the theoretical promises of OT-FM, we propose to move away from batch-OT and rely instead on a semidiscrete formulation that leverages the fact that the target dataset distribution is usually of finite size NN. The SD-OT problem is solved by estimating a dual potential vector using SGD; using that vector, freshly sampled noise vectors at train time can then be matched with data points at the cost of a maximum inner product search (MIPS). Semidiscrete FM (SD-FM) removes the quadratic dependency on n/εn/\varepsilon that bottlenecks OT-FM. SD-FM beats both FM and OT-FM on all training metrics and inference budget constraints, across multiple datasets, on unconditional/conditional generation, or when using mean-flow models.

Related readings and updates.

Flow models transform data gradually from one modality (e.g. noise) onto another (e.g. images). Such models are parameterized by a time-dependent velocity field, trained to fit segments connecting pairs of source and target points. When the pairing between source and target points is given, training flow models boils down to a supervised regression problem. When no such pairing exists, as is the case when generating data from noise, training…

Read more

Optimal transport (OT) theory focuses, among all maps T:RdRdT:\mathbb{R}^d\rightarrow \mathbb{R}^d that can morph a probability measure onto another, on those that are the “thriftiest”, i.e. such that the averaged cost c(x,T(x))c(\mathbf{x}, T(\mathbf{x})) between x\mathbf{x} and its image T(x)T(\mathbf{x}) be as small as possible. Many computational approaches have been proposed to estimate such Monge maps when cc is the 22\ell_2^2 distance, e.g., using…

Read more