paperDecember 2023

Transformers Learn Through Gradual Rank Increase

In collaboration with EPFL, Massachusetts Institute of Technology

AuthorsEnric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind

We identify incremental learning dynamics in transformers, where the difference between trained and initial weights progressively increases in rank. We rigorously prove this occurs under the simplifying assumptions of diagonal weight matrices and small initialization. Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.

Related readings and updates.

July 15, 2024research area Computer Vision, research area Methods and AlgorithmsWorkshop at ICML

This paper has been accepted at the Efficient Systems for Foundation Models workshop at ICML 2024. In this work, we study how well the learned weights of a neural network utilize the space available to them. This notion is related to capacity, but additionally incorporates the interaction of the network architecture with the dataset. Most learned weights appear to be full rank, and are therefore not amenable to low rank decomposition. This…

June 4, 2024research area Knowledge Bases and Search, research area Methods and Algorithms

Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain question-answering, or proposition-level ranking for…

Transformers Learn Through Gradual Rank Increase

Related readings and updates.

Revealing the Utilized Rank of Subspaces of Learning in Neural Networks

AGRaME: Any Granularity Ranking with Multi-Vector Embeddings

Discover opportunities in Machine Learning.