events
Prosody in human communication and speech technology: A research talk by Sara Ng
About the speaker:
"I am a Visiting Assistant Professor in the Department of Linguistics at Western Washington University. My research focuses on issues in computational models of prosody and speech perception. My primary research concerns how computational methods can be used to better understand the role of prosody in spoken language. I am interested in how humans use tune, rhythm, and pronunciation to convey pragmatic meaning such as discourse structure and conceptual pacts, and how this exchange of information is affected by hearing impairment and speech disorders. My recent work also concerns how large commercial speech recognition systems such as OpenAI's Whisper leverage prosody. I use corpus-based, theory-agnostic, engineering approaches with grounding in attested generalities of language acoustics."
Title:
Prosody in Human Communication and Speech Technology
Abstract:
Speech technology is a ubiquitous part of the modern world, from the voice-enabled assistants in smartphones to bespoke tools used by language researchers. Technological advances and the curation of large speech datasets have enabled such systems to recognize and process spoken language with remarkable quality. However, there is tremendous variation in the linguistic cues purportedly used or learned by such systems.
I argue that the phonetic signal, especially the prosodic information it encodes, is an underutilized source of information in the domain of language technology. In this talk, I will present a sample of my work which shows 1) the benefit that explicit encoding of acoustic-prosodic features can offer to speech technology, and 2) how language scientists can leverage computational tools to better understand this aspect of human communication.
Through a corpus study of English, I analyze the interaction between speakers' informative and pragmatic intentions in speech, and how this impacts their acoustic production. From the domain of speech recognition, I show how acoustic-prosodic features can be used to improve the readability of automatically generated transcripts.
Finally, I will provide a bird's eye view of how I believe the field of computational linguistics (and my research) is changing, and where I see the benefit of social science expertise to the future of our field.