paperAugust 2023

Dataset and Network Introspection ToolKit (DNIKit)

AuthorsMegan Maher Welsh, David Koski, Miguel Sarabia, Niv Sivakumar, Ian Arawjo, Aparna Joshi, Moussa Doumbouya, Xavier Suau, Luca Zappella, Nicholas Apostoloff

View source code (GitHub)

We introduce the Data and Network Introspection toolkit DNIKit, an open source Python framework for analyzing machine learning models and datasets. DNIKit contains a collection of algorithms that all operate on intermediate network responses, providing a unique understanding of how the network perceives data throughout the different stages of computation.

With DNIKit, you can:

create a comprehensive dataset analysis report
find dataset samples that are near duplicates of each other
discover rare data samples, annotation errors, or model biases
compress networks by removing highly correlated neurons
detect inactive units in a model

To visualize certain analyses, DNIKit also works with Symphony, a research platform for creating interactive data science components we originally published at ACM CHI 2022. Now open-sourced, Symphony components enable multiple stakeholders in cross-functional AIML teams to explore, visualize, and share analyses for AIML. Symphony supports a variety of data types and models, and can be used across platforms such as Jupyter Notebooks to standalone web-based dashboards. Symphony also has specific components to visualize the results from DNIKit analyses, such as computing dataset familiarity and duplicates.

We use Symphony together with DNIKit for interactive, visual dataset analysis - most notably, the Dataset Report.

Figure 1: Demonstration of extracting model responses to feed into DNIKit algorithms.

Related readings and updates.

June 16, 2023research area Human-Computer Interaction, research area Tools, Platforms, Frameworksconference CHI

Interfaces for machine learning (ML), information and visualizations about models or data, can help practitioners build robust and responsible ML systems. Despite their benefits, recent studies of ML teams and our interviews with practitioners (n=9) showed that ML interfaces have limited adoption in practice. While existing ML interfaces are effective for specific tasks, they are not designed to be reused, explored, and shared by multiple…

June 6, 2022research area Computer Vision, research area Speech and Natural Language Processing

An increasing number of the machine learning (ML) models we build at Apple each year are either partly or fully adopting the Transformer architecture. This architecture helps enable experiences such as panoptic segmentation in Camera with HyperDETR, on-device scene analysis in Photos, image captioning for accessibility, machine translation, and many others. This year at WWDC 2022, Apple is making available an open-source reference PyTorch implementation of the Transformer architecture, giving developers worldwide a way to seamlessly deploy their state-of-the-art Transformer models on Apple devices.

Dataset and Network Introspection ToolKit (DNIKit)

Related readings and updates.

Symphony: Composing Interactive Interfaces for Machine Learning

Deploying Transformers on the Apple Neural Engine

Discover opportunities in Machine Learning.