paperJuly 2021

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

AuthorsEtai Littwin, Greg Yang

Yang (2020a) recently showed that the Neural Tangent Kernel (NTK) at initialization has an infinite-width limit for a large class of architectures including modern staples such as ResNet and Transformers. However, their analysis does not apply to training. Here, we show the same neural networks (in the so-called NTK parametrization) during training follow a kernel gradient descent dynamics in function space, where the kernel is the infinite-width NTK. This completes the proof of the architectural universality of NTK behavior. To achieve this result, we apply the Tensor Programs technique: Write the entire SGD dynamics inside a Tensor Program and analyze it via the Master Theorem. To facilitate this proof, we develop a graphical notation for Tensor Programs.

Related readings and updates.

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

July 12, 2021research area Computer Vision, research area Methods and AlgorithmsWorkshop at ICML

This paper was accepted at the workshop on Overparameterization: Pitfalls and Opportunities at the ICML 2021 conference.

We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck. Unlike the neural tangent kernel limit, a bottleneck in an otherwise infinite width network al-lows data dependent feature learning in its bottle-neck representation. We empirically show that a single bottleneck in infinite…

Collegial Ensembles

December 2, 2020research area Methods and Algorithmsconference NeurIPS

Modern neural network performance typically improves as model size increases. A recent line of research on the Neural Tangent Kernel (NTK) of over-parameterized networks indicates that the improvement with size increase is a product of a better conditioned loss landscape. In this work, we investigate a form of over- parameterization achieved through ensembling, where we define collegial en- sembles (CE) as the aggregation of multiple independent…

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

Related readings and updates.

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

Collegial Ensembles

Discover opportunities in Machine Learning.