eventJune 2021

Leveraging ML Compute for Accelerated Training on Mac

Update: You can now leverage Apple’s tensorflow-metal PluggableDevice in TensorFlow v2.5 for accelerated training on Mac GPUs directly with Metal. Get started with tensorflow-metal.

The Mac has long been a popular platform for developers, engineers, and researchers. Now, with Macs powered by the all new M1 chip, and the ML Compute framework available in macOS Big Sur, neural networks can be trained right on the Mac with a huge leap in performance.

ML Compute

Until now, TensorFlow has only utilized the CPU for training on Mac. The new tensorflow_macos fork of TensorFlow 2.4 leverages ML Compute to enable machine learning libraries to take full advantage of not only the CPU, but also the GPU in both M1- and Intel-powered Macs for dramatically faster training performance. This starts by applying higher-level optimizations such as fusing layers, selecting the appropriate device type and compiling and executing the graph as primitives that are accelerated by BNNS on the CPU and Metal Performance Shaders on the GPU.

Training Performance with Mac-optimized TensorFlow

Performance benchmarks for Mac-optimized TensorFlow training show significant speedups for common models across M1- and Intel-powered Macs when leveraging the GPU for training. For example, TensorFlow users can now get up to 7x faster training on the new 13-inch MacBook Pro with M1:

Chart comparing three performance benchmarks: one running TensorFlow 2.3 on 2020 13” MacBook Pro with Intel, another running Accelerated TensorFlow 2.4 on 2020 13” MacBook Pro with Intel, and a third running Accelerated TensorFlow 2.4 on 2020 13” MacBook Pro with M1, showing up to 7x faster training. Footnote 1 provides more details. — Training impact on common models using ML Compute on M1- and Intel-powered 13-inch MacBook Pro are shown in seconds per batch, with lower numbers indicating faster training time.

Chart comparing two performance benchmarks: one running TensorFlow 2.3 on 2019 Mac Pro and another running Accelerated TensorFlow 2.4 on 2019 Mac Pro, showing up to 7x faster training for common models. Footnote 2 provides more details. — Training impact on common models using ML Compute on the Intel-powered 2019 Mac Pro are shown in seconds per batch, with lower numbers indicating faster training time.

Getting started with Mac-optimized TensorFlow

To start using Mac-optimized TensorFlow, visit the tensorflow_macos GitHub repository. You can also visit TensorFlow’s blog post to learn more.

Related readings and updates.

An increasing number of the machine learning (ML) models we build at Apple each year are either partly or fully adopting the Transformer architecture. This architecture helps enable experiences such as panoptic segmentation in Camera with HyperDETR, on-device scene analysis in Photos, image captioning for accessibility, machine translation, and many others. This year at WWDC 2022, Apple is making available an open-source reference PyTorch implementation of the Transformer architecture, giving developers worldwide a way to seamlessly deploy their state-of-the-art Transformer models on Apple devices.

See highlight details

Apple sponsored the Neural Information Processing Systems (NeurIPS) conference, which was held virtually from December 6 to 12. NeurIPS is a global conference focused on fostering the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects.

See event details

Leveraging ML Compute for Accelerated Training on Mac

ML Compute

Training Performance with Mac-optimized TensorFlow

Getting started with Mac-optimized TensorFlow

Related readings and updates.

Deploying Transformers on the Apple Neural Engine

NeurIPS 2020

Discover opportunities in Machine Learning.