Apple sponsored the thirty-seventh International Conference on Machine Learning (ICML), which was held virtually from July 12 to 18. ICML is a leading global gathering dedicated to advancing the machine learning field.

Learn more about ICML

Conference Accepted Papers

Equivariant Neural Rendering

Emilien Dupont, Miguel Angel Bautista, Alex Colburn, Aditya Sankar, Carlos Guestrin, Joshua Susskind, Qi Shan

We propose a framework for learning neural scene representations directly from images, without 3D supervision. Our key insight is that a 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. Specifically, we introduce a loss which enforces equivariance of the scene representation with respect to 3D transformations. Our formulation allows us to infer and render scenes in real time while achieving comparable results to models requiring minutes for inference. In addition, we introduce two challenging new datasets for scene representation and neural rendering, including scenes with complex lighting and backgrounds. Through experiments, we show that our model achieves compelling results on these datasets as well as on standard ShapeNet benchmarks.

AdaScale SGD: A User-Friendly Algorithm for Distributed Training

Tyler B. Johnson, Pulkit Agrawal, Haijie Gu, Carlos Guestrin

When using large-batch training to speed up stochastic gradient descent, learning rates must adapt to new batch sizes in order to maximize speed-ups and preserve model quality. Re-tuning learning rates is resource intensive, while fixed scaling rules often degrade model quality. We propose AdaScale SGD, an algorithm that reliably adapts learning rates to large-batch training. By continually adapting to the gradient's variance, AdaScale automatically achieves speed-ups for a wide range of batch sizes. We formally describe this quality with AdaScale's convergence bound, which maintains final objective values, even as batch sizes grow large and the number of iterations decreases. In empirical comparisons, AdaScale trains well beyond the batch size limits of popular "linear learning rate scaling" rules. This includes large-batch training with no model degradation for machine translation, image classification, object detection, and speech recognition tasks. AdaScale's qualitative behavior is similar to that of "warm-up" heuristics, but unlike warm-up, this behavior emerges naturally from a principled mechanism. The algorithm introduces negligible computational overhead and no new hyperparameters, making AdaScale an attractive choice for large-scale training in practice.

Learning to Branch for Multi-Task Learning

Pengsheng Guo, Chen-Yu Lee, Daniel Ulbricht

Training multiple tasks jointly in one deep network yields reduced latency during inference and better performance over the single-task counterpart by sharing certain layers of a network. However, over-sharing a network could erroneously enforce over-generalization, causing negative knowledge transfer across tasks. Prior works rely on human intuition or pre-computed task relatedness scores for ad hoc branching structures. They provide suboptimal end results and often require huge efforts for the trial-and-error process.

In this work, we present an automated multi-task learning algorithm that learns where to share or branch within a network, designing an effective network topology that is directly optimized for multiple objectives across tasks. Specifically, we propose a novel tree-structured design space that casts a tree branching operation as a gumbel softmax sampling procedure. This enables differentiable network splitting that is end-to-end trainable. We validate the proposed method on controlled synthetic data, CelebA, and Taskonomy.

Talks and Workshops

On-Device Machine Learning Talk at Expo Day

Expo Day, held on July 12, was an opportunity for ICML attendees to see machine learning in practice. At Expo Day this year, Apple presented a talk on on-device machine learning. In this talk, researchers learned how to leverage Apple's on-device machine learning to create intelligent experiences across our integrated hardware, software, and tools.

Automated Machine Learning (AutoML) Workshop

This workshop aims to make AutoML, the progressive automation research area of machine learning, more accessible and applicable to solve new problems. Researchers who study domains related to AutoML—like neural architecture search, hyperparameter optimization, meta-learning and learning to learn—are encouraged to attend this workshop. At the workshop, Raviteja Anantha presented a 5 minute lightning talk on his workshop accepted paper on July 18.

**AutoML Workshop Accepted Paper: **
Generalized Reinforcement Meta Learning for Few-Shot Optimization
Raviteja Anantha, Stephen Pulman, Srinivas Chappidi

Affinity Group Workshops

Apple sponsored the LatinX in AI, Queer in AI, and Women in Machine Learning workshops throughout the week.

At the Women in Machine Learning workshop on July 13, Lizi Ottens shared how our on-device machine learning powers intelligent experiences on Apple products.

Learn more about Apple’s company-wide inclusion and diversity efforts

Related readings and updates.

ICML 2021

Apple sponsored the thirty-eighth International Conference on Machine Learning (ICML). This conference focuses on the advancement of the branch of artificial intelligence known as machine learning and will take place virtually from July 18 to 24.

See event details

AdaScale SGD: A User-Friendly Algorithm for Distributed Training

When using large-batch training to speed up stochastic gradient descent, learning rates must adapt to new batch sizes in order to maximize speed-ups and preserve model quality. Re-tuning learning rates is resource intensive, while fixed scaling rules often degrade model quality. We propose AdaScale SGD, an algorithm that reliably adapts learning rates to large-batch training. By continually adapting to the gradient's variance, AdaScale…
See paper details