View publication

Recent advancements in long-context language models (LCLMs) have the potential to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their extended context windows, LCLMs can process entire knowledge bases and directly handle retrieval and reasoning. This capability is defined as In-Context Retrieval and Reasoning (ICR2). However, existing benchmarks like LOFT often overestimate LCLM performance because they lack sufficiently challenging contexts. To address this, we introduce ICR2, a benchmark designed for more realistic evaluation and training of LCLMs. This dataset simulates practical scenarios by including confounding documents retrieved using strong retrievers. Additionally, we propose methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) explicit modeling of a retrieval head trained jointly with the generation head, and (3) retrieval-attention-probing decoding, which uses attention heads to filter and refine long contexts. Through extensive benchmarking of four well-known LCLMs on LOFT and ICR2, we show that our best approach, applied to Mistral-7B, achieves significant improvements: +17 and +15 on LOFT, and +13 and +2 on ICR2, compared to zero-shot RAG and in-domain supervised fine-tuned models, respectively. It even outperforms GPT-4 on most tasks, despite having a much smaller model size.

Related readings and updates.

Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts. Their inference cost scales quadratically with respect to sequence length, making it expensive for deployment in some real-world text processing applications, such as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the "distraction phenomenon," where irrelevant context in the prompt degrades...

Read more

This paper was accepted at the UncertaiNLP workshop at EACL 2024.

Large language models (LLMs) have the remarkable ability to solve new tasks with just a few examples, but they need access to the right tools. Retrieval Augmented Generation (RAG) addresses this problem by retrieving a list of relevant tools for a given task. However, RAG's tool retrieval step requires all the required information to be explicitly present in the query. This is a...

Read more