Training Effects in Audio-visual Integration of Sine Wave Speech
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Speech perception is a bimodal process that involves both auditory and visual inputs. The auditory signal typically provides enough information for speech perception; however, when the auditory signal is compromised, such as when listening in a noisy environment or due to a hearing loss, people rely on visual cues to aid in understanding speech. Visual cues have been shown to significantly improve speech perception when the auditory signal is degraded in some way. The McGurk and MacDonald study (1976) strongly supported the fact that speech is not a purely auditory process and that there is a visual influence even with perfect auditory input.
There is a growing interest in the benefit that listeners receive from audio-visual integration when the auditory signal is compromised. Remez et al, (1981) studied intelligibility when the speech waveform is reduced to three sine waves that represent the first three formants of the original signal. Remez discovered that sine wave speech is still highly intelligible even though a considerable amount of information was removed from the speech signal. Grant and Seitz (1998) looked at audio-visual integration performance of hearing impaired listeners by comparing a variety of audio-visual integration tasks using nonsense syllables and sentences. The study’s results showed that even when the auditory signal is poor, speech perception is highly improved with the aid of visual cues. However, a large degree of variability was seen in the benefit that listeners receive from audio-visual integration. Further analysis suggested that at least some of this variability can be attributed to individual differences in listeners’ abilities to integrate auditory and visual speech information.
Studies in our lab have explored the differences in benefit that listeners receive from visual cues during audio-visual integration. We propose that one source of the variability in the benefit that listeners receive may be the overall amount of information available in the auditory signal.
A previous study in our laboratory, Tamosiunas (2007) explored the audio-visual benefit that listeners received using highly-degraded sine wave speech. Results of this study indicated that listeners received little benefit from the addition of visual cues and in some cases these cues actually inhibited speech perception. A possible explanation for the difficulties in speech perception found in this study was the degree of exposure subjects had to sine wave speech.
The present study explored whether the lack of audio-visual integration and benefit seen in Tamosiunas’ (2007) study was a result of unfamiliarity with sine wave speech or whether this degree of auditory signal degrading inhibits audio-visual integration. To accomplish this, listeners in the present study were provided auditory and audio-visual training in sine wave speech perception. Results show that with training and exposure, speech perception performance did increase in both auditory and audio-visual conditions.