AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

AuthorsTony Joseph, Carlos Pareja, David Lopes Pegna, Abhishek Singh

We present AMES (Approximate Multimodal Enterprise Search), a unified multimodal late interaction retrieval architecture which is backend agnostic. AMES demonstrates that fine-grained multimodal late interaction retrieval can be deployed within a production grade enterprise search engine without architectural redesign. Text tokens, image patches, and video frames are embedded into a shared representation space using multi-vector encoders, enabling cross-modal retrieval without modality specific retrieval logic. AMES employs a two-stage pipeline: parallel token level ANN search with per document Top-M MaxSim approximation, followed by accelerator optimized Exact MaxSim re-ranking. Experiments on the ViDoRe V3 benchmark show that AMES achieves competitive ranking performance within a scalable, production ready Solr based system.

Related readings and updates.

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

January 12, 2026research area Computer Vision, research area Speech and Natural Language Processing

Multimodal Large Language Models (MLLMs) in real-world applications require access to external knowledge sources and must remain responsive to the dynamic and ever-changing real-world information in order to address information-seeking and knowledge-intensive user queries. Existing approaches, such as retrieval augmented generation (RAG) methods, search agents, and search equipped MLLMs, often suffer from rigid pipelines, excessive search calls,…

Context Tuning for Retrieval Augmented Generation

December 18, 2023research area Knowledge Bases and Search, research area Speech and Natural Language ProcessingWorkshop at EACL

This paper was accepted at the UncertaiNLP workshop at EACL 2024.

Large language models (LLMs) have the remarkable ability to solve new tasks with just a few examples, but they need access to the right tools. Retrieval Augmented Generation (RAG) addresses this problem by retrieving a list of relevant tools for a given task. However, RAG’s tool retrieval step requires all the required information to be explicitly present in the query. This is a…

AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

Related readings and updates.

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

Context Tuning for Retrieval Augmented Generation

Discover opportunities in Machine Learning.