View publication

Over its three decade history, speech translation has experienced several shifts in its primary research themes; moving from loosely coupled cascades of speech recognition and machine translation, to exploring questions of tight coupling, and finally to end-to-end models that have recently attracted much attention. This paper provides a brief survey of these developments, along with a discussion of the main challenges of traditional approaches which stem from committing to intermediate representations from the speech recognizer, and from training cascaded models separately towards different objectives. Recent end-to-end modeling techniques promise a principled way of overcoming these issues by allowing joint training of all model components and removing the need for explicit intermediate representations. However, a closer look reveals that many end-to-end models fall short of solving these issues, due to compromises made to address data scarcity. This paper provides a unifying categorization and nomenclature that covers both traditional and recent approaches and that may help researchers by highlighting both trade-offs and open research questions.

Related readings and updates.

Apple at Interspeech 2020

Apple is sponsoring the thirty-second Interspeech conference, which will be held virtually from October 25 to 29. Interspeech is a global conference focused on cognitive intelligence for speech processing and application.

See event details

Apple at ACL 2020

Apple sponsored the 58th Annual Meeting of the Association for Computational Linguistics (ACL) from July 5 - 10. ACL is the premier conference of the field of computational linguistics, covering a broad spectrum of research areas regarding computational approaches to natural language.

See event details