View publication

We present an empirical study on embedding the lyrics of a song into a fixed-dimensional feature for the purpose of music tagging. Five methods of computing token-level and four methods of computing document-level representations are trained on an industrial-scale dataset of tens of millions of songs. We compare simple averaging of pretrained embeddings to modern recurrent and attention-based neural architectures. Evaluating on a wide range of tagging tasks such as genre classification, explicit content identification and era detection, we find that averaging word embeddings outperform more complex architectures in many downstream metrics.

Related readings and updates.

Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings

Song embeddings are a key component of most music recommendation engines. In this work, we study the hyper-parameter optimization of behavioral song embeddings based on Word2Vec on a selection of downstream tasks, namely next-song recommendation, false neighbor rejection, and artist and genre clustering. We present new optimization objectives and metrics to monitor the effects of hyper-parameter optimization. We show that single-objective…
See paper details

Can Global Semantic Context Improve Neural Language Models?

Entering text on your iPhone, discovering news articles you might enjoy, finding out answers to questions you may have, and many other language-related tasks depend upon robust natural language processing (NLP) models. Word embeddings are a category of NLP models that mathematically map words to numerical vectors. This capability makes it fairly straightforward to find numerically similar vectors or vector clusters, then reverse the mapping to get relevant linguistic information. Such models are at the heart of familiar apps like News, search, Siri, keyboards, and Maps. In this article, we explore whether we can improve word predictions for the QuickType keyboard using global semantic context.

See highlight details