AGRaME: Any Granularity Ranking with Multi-Vector Embeddings

AuthorsRevanth Gangi Reddy, Omar Attia, Yunyao Li, Heng Ji, Saloni Potdar

Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain question-answering, or proposition-level ranking for attribution. In this work, we introduce the idea of any-granularity ranking which leverages multi-vector approaches to rank at varying levels of granularity while maintaining encoding at a single (coarser) level of granularity. We propose a multi-granular contrastive loss for training multi-vector approaches, and validate its utility with both sentences and propositions as ranking units. Finally, we demonstrate the application of proposition-level ranking to post-hoc citation addition in retrieval-augmented generation, surpassing the performance of prompt-driven citation generation.

Figure 1: Ranking at different levels of granularity. X→Y is used to denote that X represents the query granularity used for ranking, with entire query encoded, and Y indicates the granularity of the retrieval unit being ranked, with entire retrieval unit encoded. In addition to the typical ranking setting (A), our proposed approach enables ranking finer retrieval units (B and D) or using finer query units for ranking (C and D).

Figure 2: Figure demonstrating our sentence-level scoring methodology using multi-vector representations with encoding at passage-level. Query marker m_q is used while getting passage-level score P , while marker m′_q is used for getting sentence-level scores S1, S2, S3.

Related readings and updates.

Towards Automatic Assessment of Self-Supervised Speech Models Using Rank

March 5, 2025research area Speech and Natural Language Processingconference ICASSP

This study explores using embedding rank as an unsupervised evaluation metric for general-purpose speech encoders trained via self-supervised learning (SSL). Traditionally, assessing the performance of these encoders is resource-intensive and requires labeled data from the downstream tasks. Inspired by the vision domain, where embedding rank has shown promise for evaluating image encoders without tuning on labeled downstream data, this work…

Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks

July 9, 2021research area Computer Vision, research area Methods and AlgorithmsWorkshop at ICML

This paper was accepted at the workshop on Overparameterization: Pitfalls and Opportunities at the ICML 2021 conference.

Deep linear networks trained with gradient descent yield low rank solutions, as is typically studied in matrix factorization. In this paper, we take a step further and analyze implicit rank regularization in autoencoders. We show greedy learning of low-rank latent codes induced by a linear sub-network at the autoencoder…

AGRaME: Any Granularity Ranking with Multi-Vector Embeddings

Related readings and updates.

Towards Automatic Assessment of Self-Supervised Speech Models Using Rank

Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks

Discover opportunities in Machine Learning.