Rescribe: Authoring and Automatically Editing Audio Descriptions
authors Amy Pavel, Gabriel Reyes, Jeffrey Bigham
Audio descriptions make videos accessible to those who cannot see them by describing visual content in audio. Producing audio descriptions is challenging due to the synchronous nature of the audio description that must fit into gaps of other video content. An experienced audio description author will produce content that fits narration necessary to understand, enjoy, or experience the video content into the time available. This can be especially tricky for novices to do well. In this paper, we introduce a tool, Rescribe, that helps authors create and refine their audio descriptions. Using Rescribe, authors first create a draft of all the content they would like to include in the audio description. Rescribe then uses a dynamic programming approach to optimize between the length of the audio description, available automatic shortening approaches, and source track lengthening approaches. Authors can iteratively visualize and refine the audio descriptions produced by Rescribe, working in concert with the tool. We evaluate the effectiveness of Rescribe through interviews with blind and visually impaired audio description users who preferred Rescribe-edited descriptions to extended descriptions. In addition, we invite novice users to create audio descriptions with Rescribe and another tool, finding that users produce audio descriptions with fewer placement errors using Rescribe.
Users expect Siri speech recognition to work well regardless of language, device, acoustic environment, or communication channel bandwidth. Like many other supervised machine learning tasks, achieving such high accuracy usually requires large amounts of labeled data. Whenever we launch Siri in a new language, or extend support to different audio channel bandwidths, we face the challenge of having enough data to train our acoustic models. In this article, we discuss transfer learning techniques that leverage data from acoustic models already in production. We show that the representations are transferable not only across languages but also across audio channel bandwidths. As a case study, we focus on recognizing narrowband audio over 8 kHz Bluetooth headsets in new Siri languages. Our techniques help to improve significantly Siri’s accuracy on the day we introduce a new language.