Optimizing Acoustic Array Beamforming to Aid a Speech Recognition System
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The iBrutus is a pilot project at the Computer Science and Engineering (CSE) department at OSU which develops human-computer interaction via spoken dialog. The goal of the iBrutus project is to design a kiosk with a talking avatar on a screen which will answer questions at a public event like a football game at a potentially noisy environment like the Ohio Stadium.
In such an environment, the speech recognition software employed by the system would be ineffective without prior processing to obtain a cleaner speech signal. As a rule of thumb, if the iBrutus could correctly interpret 70% or more words, it could successfully to map the input to a known question/command. To improve the speech recognition rate the author has chosen to research a beamforming algorithm. Such an algorithm combines inputs of from a microphone array to minimize the interference while preserving the desired signal (i.e. speech arriving from a known direction/location).
The goal of the research has been to develop such an algorithm and a means of testing to determine which parameters associated with the algorithm – such as the spatial geometry of the microphone array – will produce the desired speech recognition rate in minimum processing time. The beamforming algorithm designed by the author in MATLAB was frequency based wideband Minimum Variance Distortionless Response (MVDR). Tests showed that at least 70% word recognition rate could be achieved under certain parameter choices. The processing time of the MATLAB-based algorithm is currently larger than desired for use with iBrutus, but there is potential for improvement.