View publication

While researchers have examined alternative (alt) text for social media and news contexts, few have studied the status and challenges for authoring alt text of figures in computing-related publications. These figures are distinct, often conveying dense visual information, and may necessitate unique accessibility solutions. Accordingly, we explored how to support authors in creating alt text in computing publications---specifically in the field of human-computer interaction (HCI). We conducted two studies: (1) an analysis of 300 recently published figures at a general HCI conference (ACM CHI), and (2) interviews with 10 researchers in HCI and related fields who have varying levels of experience writing alt text. Our findings characterize the prevalence, quality, and patterns of recent figure alt text and captions. We further identify challenges authors encounter, describing their workflow barriers and confusions around how to compose alt text for complex figures. We conclude by outlining a research agenda on process, education, and tooling opportunities to improve alt text in computing-related publications.

Related readings and updates.

Flexible Keyword Spotting based on Homogeneous Audio-Text Embedding

Spotting user-defined/flexible keywords represented in text frequently uses an expensive text encoder for joint analysis with an audio encoder in an embedding space, which can suffer from heterogeneous modality representation (for example, large mismatch) and increased complexity. In this work, we propose a novel architecture to efficiently detect arbitrary keywords based on an audio-compliant text encoder, which inherently has homogeneous…
See paper details

Matching Latent Encoding for Audio-Text based Keyword Spotting

Using audio and text embeddings jointly for Keyword Spotting (KWS) has shown high-quality results, but the key challenge of how to semantically align two embeddings for multi-word keywords of different sequence lengths remains largely unsolved. In this paper, we propose an audio-text-based end-to-end model architecture for flexible keyword spotting (KWS), which builds upon learned audio and text embeddings. Our architecture uses a novel…
See paper details