View publication

Query Auto-Completion (QAC) is a critical feature of modern search systems that improves search efficiency by suggesting completions as users type. However, existing approaches face fundamental challenges: traditional retrieve-and-rank pipelines have poor long-tail coverage and require extensive feature engineering, while recent generative methods suffer from hallucination and safety risks. We present a unified framework that reformulates QAC as end-to-end list generation through Retrieval-Augmented Generation (RAG) and multi-objective Direct Preference Optimization (DPO).

Our approach combines three key innovations:

  1. Reformulating QAC as end-to-end list generation with multi-objective optimization;
  2. A comprehensive methodology combining RAG, multi-objective DPO with learned and rule-based verifiers, and iterative critique-revision for high-quality synthetic data;
  3. A hybrid serving architecture enabling efficient production deployment under strict latency constraints.

Evaluation on a large-scale commercial search platform demonstrates substantial improvements: offline metrics show gains across all dimensions, human evaluation yields +0.40 to +0.69 preference scores, and a controlled online experiment achieves 5.44% reduction in keystrokes and 3.46% increase in suggestion adoption, validating that unified generation with RAG and multi-objective alignment provides an effective solution for production QAC.

This work represents a paradigm shift to end-to-end generation powered by large language models, RAG, and multi-objective alignment, establishing a production-validated framework that can benefit the broader search and recommendation industry.

Related readings and updates.

Traditional query auto-completion (QAC) relies heavily on search logs collected over many users. However, in on-device email search, the scarcity of logs and the governing privacy constraints make QAC a challenging task. In this work, we propose an on-device QAC method that runs directly on users’ devices, where users’ sensitive data and interaction logs are not collected, shared, or aggregated through web services. This method retrieves…

Read more

This paper was accepted at the UncertaiNLP workshop at EACL 2024.

Large language models (LLMs) have the remarkable ability to solve new tasks with just a few examples, but they need access to the right tools. Retrieval Augmented Generation (RAG) addresses this problem by retrieving a list of relevant tools for a given task. However, RAG’s tool retrieval step requires all the required information to be explicitly present in the query. This is a…

Read more