Never-ending Learning of User Interfaces

In collaboration with Carnegie Mellon University

AuthorsJason Wu, Rebecca Krosnick, Eldon Schoop, Amanda Swearngin, Jeffrey P. Bigham, Jeffrey Nichols

Machine learning models have been trained to predict semantic information about user interfaces (UIs) to make apps more accessible, and easier to test and automate. Currently, most models rely on datasets that are collected and labeled by human crowd-workers, a process that is costly and surprisingly error-prone for certain tasks. For example, it is possible to guess if a UI element is "tappable" from a screenshot (i.e., based on visual signifiers) or from potentially unreliable metadata (e.g., a view hierarchy), but one way to know for certain is to programmatically tap the UI element and observe the effects. We built the Never-ending UI Learner, an app crawler that automatically installs real apps from a mobile app store and crawls them to discover new and challenging training examples to learn from. The Never-ending UI Learner has crawled for more than 5,000 device hours, performing over half a million actions on 6,000 apps to train three computer vision models for tappability prediction, draggability prediction, and, screen similarity.

Related readings and updates.

September 29, 2021research area Accessibility, research area Human-Computer Interactionconference UIST

Automated understanding of user interfaces (UIs) from their pixels can improve accessibility, enable task automation, and facilitate interface design without relying on developers to comprehensively provide metadata. A first step is to infer what UI elements exist on a screen, but current approaches are limited in how they infer how those elements are semantically grouped into structured interface definitions. In this paper, we motivate the...

May 20, 2021research area Accessibility, research area Human-Computer Interaction

At Apple we use machine learning to teach our products to understand the world more as humans do. Of course, understanding the world better means building great assistive experiences. Machine learning can help our products be intelligent and intuitive enough to improve the day-to-day experiences of people living with disabilities. We can build machine-learned features that support a wide range of users including those who are blind or have low vision, those who are deaf or are hard of hearing, those with physical motor limitations, and also support those with cognitive disabilities.

Never-ending Learning of User Interfaces

Related readings and updates.

Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots

Making Mobile Applications Accessible with Machine Learning

Discover opportunities in Machine Learning.