View publication

In deep learning, embeddings are widely used to represent categorical entities such as words, apps, and movies. An embedding layer maps each entity to a unique vector, causing the layer’s memory requirement to be proportional to the number of entities. In the recommendation domain, a given category can have hundreds of thousands of entities, and its embedding layer can take gigabytes of memory. The scale of these networks makes them difficult to deploy in resource constrained environments, such as smartphones. In this paper, we propose a novel approach for reducing the size of an embedding table while still mapping each entity to its own unique embedding. Rather than maintaining the full embedding table, we construct each entity’s embedding “on the fly” using two separate embedding tables. The first table employs hashing to force multiple entities to share an embedding. The second table contains one trainable weight per entity, allowing the model to distinguish between entities sharing the same embedding. Since these two tables are trained jointly, the network is able to learn a unique embedding per entity, helping it maintain a discriminative capability similar to a model with an uncompressed embedding table. We call this approach MEmCom (Multi-Embedding Compression). We compare with state-of-the-art model compression techniques for multiple problem classes including classification and ranking using datasets from various domains. On four popular recommender system datasets, MEmCom had a 4% relative loss in nDCG while compressing the input embedding sizes of our recommendation models by 16x, 4x, 12x, and 40x. MEmCom outperforms the state-of-the-art model compression techniques, which achieved 16%, 6%, 10%, and 8% relative loss in nDCG at the respective compression ratios. Additionally, MEmCom is able to compress the RankNet ranking model by 32x on a dataset with millions of users’ interactions with games while incurring only a 1% relative loss in nDCG.

Related readings and updates.

Single Training Dimension Selection for Word Embedding with PCA

In this paper, we present a fast and reliable method based on PCA to select the number of dimensions for word embeddings. First, we train one embedding with a generous upper bound (e.g. 1,000) of dimensions. Then we transform the embeddings using PCA and incrementally remove the lesser dimensions one at a time while recording the embeddings' performance on language tasks. Lastly, we select the number of dimensions while balancing model size and…
See paper details

Can Global Semantic Context Improve Neural Language Models?

Entering text on your iPhone, discovering news articles you might enjoy, finding out answers to questions you may have, and many other language-related tasks depend upon robust natural language processing (NLP) models. Word embeddings are a category of NLP models that mathematically map words to numerical vectors. This capability makes it fairly straightforward to find numerically similar vectors or vector clusters, then reverse the mapping to get relevant linguistic information. Such models are at the heart of familiar apps like News, search, Siri, keyboards, and Maps. In this article, we explore whether we can improve word predictions for the QuickType keyboard using global semantic context.

See article details