AXNav: Replaying Accessibility Tests from Natural Language

AuthorsMaryam Taeb, Amanda Swearngin, Eldon Schoop, Regina Cheng, Yue Jiang, Jeff Nichols

Despite increasing awareness of the need to support accessibility in mobile apps, many still lack support for key accessibility features. Developers and quality assurance testers often rely on manual testing to test accessibility features throughout the product lifecyle. However, manual testing can be tedious, often has an overwhelming scope, and test passes can be difficult to time amongst other development milestones. Recently, Large Language Models (LLMs) have been used for a variety of tasks including automation of UIs; however, none have yet explored their use in controlling assistive technologies for the purposes of supporting accessibility testing. In this paper, we explore the requirements of natural language based accessibility testing workflow through a formative study. Based on this, we present a system that takes as input a manual accessibility test (e.g., "Search for a show in VoiceOver") and uses an LLM combined with pixel-based UI Understanding models to convert the test into a chaptered, navigable video that a QA tester can use to pinpoint issues. In each video, we apply heuristics to detect and flag accessibility issues (e.g., Text size not increasing with Large Text enabled, VoiceOver navigation loops) to help QA testers more easily pinpoint issues. We evaluate this system through a 10 participant user study with accessibility QA professionals who indicated that the tool would be very useful in their current work and gave us several promising directions for future work.

Figure 1: AXNav interprets accessibility test instructions specified in natural language, executes them on a remote cloud device using an LLM-based multiagent planner, and produces a chaptered video of the test annotated with heuristics that highlight potential accessibility issues. To execute a test, AXNav provisions a cloud iOS device; stages the device by installing the target app to be tested and enabling a specified assistive feature; synthesizes a tentative step-by-step plan to execute the test from the test instructions; executes each step of the plan, updating the plan as needed; and annotates a screen recording of the test with chapter markers and visual elements that point out potential accessibility issues.

1) Title: iOS: VoiceOver: Search for a Show

Go to Settings > Accessibility > VoiceOver, and enable VoiceOver (VO)
Launch the TV app
Search for a show and verify that everything works as expected and there are accurate labels
Turn off VO and verify that searching for a show works as expected

2) iOS: Podcasts: Dynamic Text in Search Tab

In Settings > Accessibility > Display & Text Size, enable larger text and set to maximum size
Launch Podcasts
Verify all text (titles, headers, etc.) font size has adjusted consistently
Set text size to minimum and repeat step 3
Reset text size to default and verify all text returns to normal

3) iOS: Podcasts: Button Shapes across app

Expected Result: When Testing button shapes- we want to make sure that all text (not emojis or glyphs) get underlined if they are NOT inside of a button shape already. If the text is already within a button shape, it is a bug!

Figure 2: Three samples of manual accessibility test cases that AXNav can interpret and replay, from an internal regression testing database of manual tests. These tests validate the accessibility features of VoiceOver, Dynamic Type, and Button Shapes. Testing instructions typically consist of a title containing the app and feature under test, and a set of manual test instructions in natural language. The tests may also contain expected result descriptions. Some tests have specific, low-level instructions (1,2) and others give only a high-level instruction (3).

AXNav: Replaying Accessibility Tests from Natural Language

1) Title: iOS: VoiceOver: Search for a Show

2) iOS: Podcasts: Dynamic Text in Search Tab

3) iOS: Podcasts: Button Shapes across app

Related readings and updates.

When Can Accessibility Help? An Exploration of Accessibility Feature Recommendation on Mobile Devices

Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels

Discover opportunities in Machine Learning.