Apple Machine Learning Research

Apple Machine Learning Research https://machinelearning.apple.com Apple machine learning teams are engaged in state of the art research in machine learning and artificial intelligence. Learn about the latest advancements. en Thu, 20 Feb 2025 00:00:00 GMT evaluating-sample-utility Evaluating Sample Utility for Data Selection by Mimicking Model Weights https://machinelearning.apple.com/research/evaluating-sample-utility Foundation models are trained on large-scale web-crawled datasets, which often contain noise, biases, and irrelevant information. This motivates the use of data selection techniques, which can be divided into model-free variants -- relying on heuristic rules and downstream datasets -- and model-based, e.g., using influence functions. The former can be expensive to design and risk introducing unwanted dependencies, while the latter are often computationally prohibitive. Instead, we propose an efficient, model-based approach using the Mimic Score, a new data quality metric that leverages the… Thu, 20 Feb 2025 00:00:00 GMT grounding-multimodal-large Grounding Multimodal Large Language Models in Actions https://machinelearning.apple.com/research/grounding-multimodal-large Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens of action space adaptors. For continuous actions, we show that a learned tokenization allows for sufficient modeling precision, yielding the best performance on downstream tasks. For discrete actions… Thu, 20 Feb 2025 00:00:00 GMT wearable-accelerometer-foundation-models Wearable Accelerometer Foundation Models for Health via Knowledge Distillation https://machinelearning.apple.com/research/wearable-accelerometer-foundation-models Modern wearable devices can conveniently record various biosignals in the many different environments of daily living, enabling a rich view of individual health. However, not all biosignals are the same: high-fidelity biosignals, such as photoplethysmogram (PPG), contain more physiological information, but require optical sensors with a high power footprint. Alternatively, a lower-fidelity biosignal such as accelerometry has a significantly smaller power footprint and is available in almost any wearable device. While accelerometry is widely used for activity recognition and fitness, it is less… Thu, 20 Feb 2025 00:00:00 GMT flex-tok-resampling FlexTok: Resampling Images into 1D Token Sequences of Flexible Length https://machinelearning.apple.com/research/flex-tok-resampling This work was done in collaboration with Swiss Federal Institute of Technology Lausanne (EPFL). Image tokenization has enabled major advances in autoregressive image generation by providing compressed, discrete representations that are more efficient to process than raw pixels. While traditional approaches use 2D grid tokenization, recent methods like TiTok have shown that 1D tokenization can achieve high generation quality by eliminating grid redundancies. However, these methods typically use a fixed number of tokens and thus cannot adapt to an image’s inherent complexity. We introduce… Wed, 19 Feb 2025 00:00:00 GMT generalist-embodied-agents From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons https://machinelearning.apple.com/research/generalist-embodied-agents We examine the capability of Multimodal Large Language Models (MLLMs) to tackle diverse domains that extend beyond the traditional language and vision tasks these models are typically trained on. Specifically, our focus lies in areas such as Embodied AI, Games, UI Control, and Planning. To this end, we introduce a process of adapting an MLLM to a Generalist Embodied Agent (GEA). GEA is a single unified model capable of grounding itself across these varied domains through a multi-embodiment action tokenizer. GEA is trained with supervised learning on a large dataset of embodied experiences and… Wed, 19 Feb 2025 00:00:00 GMT kv-prediction KV Prediction for Improved Time to First Token https://machinelearning.apple.com/research/kv-prediction Inference with transformer-based language models begins with a prompt processing step. In this step, the model generates the first output token and stores the KV cache needed for future generation steps. This prompt processing step can be computationally expensive, taking 10s of seconds or more for billion-parameter models on edge devices when prompt lengths or batch sizes rise. This degrades user experience by introducing significant latency into the model's outputs. To reduce the time spent producing the first output (known as the "time to first token", or TTFT) of a pretrained model, we… Wed, 19 Feb 2025 00:00:00 GMT scalable-graph-neural-network Transfer Learning in Scalable Graph Neural Network for Improved Physical Simulation https://machinelearning.apple.com/research/scalable-graph-neural-network In recent years, graph neural network (GNN) based models showed promising results in simulating complex physical systems. However, training dedicated graph network simulator can be costly, as most models are confined to fully supervised training. Extensive data generated from traditional simulators is required to train the model. It remained unexplored how transfer learning could be applied to improve the model performance and training efficiency. In this work, we introduce a pretraining and transfer learning paradigm for graph network simulator. First, We proposed the scalable graph U-net… Fri, 14 Feb 2025 00:00:00 GMT armor-egocentric ARMOR: Egocentric Perception for Humanoid Robot Collision Avoidance and Motion Planning https://machinelearning.apple.com/research/armor-egocentric Humanoid robots have significant gaps in their sensing and perception, making it hard to perform motion planning in dense environments. To address this, we introduce ARMOR, a novel egocentric perception system that integrates both hardware and software, specifically incorporating wearable-like depth sensors for humanoid robots. Our distributed perception approach enhances the robot’s spatial awareness, and facilitates more agile motion planning. We also train a transformer-based imitation learning (IL) policy in simulation to perform dynamic collision avoidance, by leveraging around 86 hours… Thu, 13 Feb 2025 00:00:00 GMT robust-autonomy-emerges Robust Autonomy Emerges from Self-Play https://machinelearning.apple.com/research/robust-autonomy-emerges Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy in another domain. We show that robust and naturalistic driving emerges entirely from self-play in simulation at unprecedented scale -- 1.6~billion~km of driving. This is enabled by GigaFlow, a batched simulator that can synthesize and train on 42 years of subjective driving experience per hour on a single 8-GPU node. The resulting policy achieves state-of-the-art performance on three independent autonomous driving benchmarks. The policy outperforms the… Thu, 13 Feb 2025 00:00:00 GMT learning-real-world-application Private Federated Learning In Real World Application – A Case Study https://machinelearning.apple.com/research/learning-real-world-application This paper presents an implementation of machine learning model training using private federated learning (PFL) on edge devices. We introduce a novel framework that uses PFL to address the challenge of training a model using users' private data. The framework ensures that user data remain on individual devices, with only essential model updates transmitted to a central server for aggregation with privacy guarantees. We detail the architecture of our app selection model, which incorporates a neural network with attention mechanisms and ambiguity handling through uncertainty management… Wed, 12 Feb 2025 00:00:00 GMT