Analysis of Talker Characteristics in Audio-visual Speech Integration
audiovisual speech integration
MetadataShow full item record
Publisher:The Ohio State University
Series/Report no.:The Ohio State University. Department of Speech and Hearing Science Honors Theses; 2008
Speech perception is commonly thought of as an auditory process, but in actuality it is a multimodal process that integrates both auditory and visual information. In certain situations where auditory information has been compromised, such as due to a hearing impairment or a noisy environment, visual cues help listeners to fill in missing pieces of auditory information during communication. Interestingly, even when both auditory and visual cues are entirely comprehensible alone, both are taken into account during speech perception. McGurk and MacDonald (1976) demonstrated that listeners not only benefit from the addition of visual cues during speech perception in situations where there is a lack of auditory information, but also that speech perception naturally employs audio-visual integration when both cues are available. Although a growing body of research has demonstrated that listeners integrate auditory and visual information during speech perception, there is a significant degree of variability seen in the audio-visual integration and benefit of listeners. Grant and Seitz (1998) demonstrated that the variability in audio-visual speech integration is, in part, a result of individual listener differences in multimodal integration ability. We suggest that individual characteristics of both the auditory signal and talker might also influence the audio-visual speech integration process (Andrews, 2007; Hungerford, 2007; Huffman, 2007). Research from our lab has demonstrated a significant amount of variability in the performance of listeners on tasks of degraded auditory-only and audio-visual speech perception. Furthermore, these studies have revealed a significant amount of variability across different talkers in the degree of integration they elicit. The amount of information in the auditory signal clearly has an effect on audio-visual integration. However, in order to fully understand how different talkers and the varying information in the auditory signal impact audio-visual performance, an analysis of the speech waveform must be performed to directly compare acoustic characteristics with subject performance. The present study conducted a spectrographic analysis of the speech syllables of different talkers used in a previous perception study to evaluate individual acoustic characteristics. Based on behavioral confusion matrices that were made we were able to easily examine possible confusions demonstrated by listeners. Some of the behavioral confusions were easily explained by examining syllable formant tracks, while others were explained by the possibility that noise introduced into the waveform when the stimuli were degraded obscured subtle differences in the voice onset time of some confused syllables. Still other confusions were not easily explained by the analysis completed in the present study. The results of the present study provide the foundation for understanding aspects of the acoustic waveform and talker qualities that are desirable for optimal audio-visual speech integration and might also have implications for the design of future aural rehabilitation programs.