View publication

This paper presents Wally, a private search system that supports efficient semantic and keyword search queries against large databases. When sufficiently many clients are making queries, Wally’s performance is significantly better than previous systems. In previous private search systems, for each client query, the server must perform at least one expensive cryptographic operation per database entry. As a result, performance degraded proportionally with the number of entries in the database. In Wally, we remove this limitation. Specifically, for each query the server performs cryptographic operations against only a few database entries. We achieve these results by requiring each client to add a few fake queries, and send each query via an anonymous network to the server at independently chosen random instants. Additionally, each client also uses somewhat homomorphic encryption (SHE) to hide whether a query is real or fake. Wally provides (ε, δ) -differential privacy guarantee, which is an accepted standard for strong privacy. The number of fake queries each client makes depends inversely on the number of clients making queries. Therefore, the fake queries’ overhead vanishes as the number of clients increases, enabling scalability to millions of queries and large databases. Concretely, Wally can process eight million queries in 117 minutes, or just under two hours. That is around four orders of magnitude faster than the state of the art.

Related readings and updates.

Combining Machine Learning and Homomorphic Encryption in the Apple Ecosystem

At Apple, we believe privacy is a fundamental human right. Our work to protect user privacy is informed by a set of privacy principles, and one of those principles is to prioritize using on-device processing. By performing computations locally on a user’s device, we help minimize the amount of data that is shared with Apple or other entities. Of course, a user may request on-device experiences powered by machine learning (ML) that can be enriched…
See highlight details

Synthetic Query Generation using Large Language Models for Virtual Assistants

This paper was accepted in the Industry Track at SIGIR 2024. Virtual Assistants (VAs) are important Information Retrieval platforms that help users accomplish various tasks through spoken commands. The speech recognition system (speech-to-text) uses query priors, trained solely on text, to distinguish between phonetically confusing alternatives. Hence, the generation of synthetic queries that are similar to existing VA usage can greatly improve…
See paper details