Gender-Based Language Modeling: Salient Features and Interpolation Methods
MetadataShow full item record
Publisher:The Ohio State University
Series/Report no.:The Ohio State University. Department of Computer Science and Engineering Honors Theses; 2006
Sociolinguistic studies suggest that a relationship exists between the gender of a speaker and the words she or he chooses in conversational speech and writing (1, 7, 8). However, little research exists within computational linguistics on the subject of using the classification of gender as a tool for predicting word choice in automatic speech recognition systems. Our study presents an analysis of a recent study by Boulis and Ostendorf (2005), which demonstrated success in using words as an automatic classification of speaker gender, but failed to significantly improve perplexity in automatic speech recognition by using gender-based models (1). Specifically, small but insignificant gains over a general, non-gendered training model appeared when genderbased language models were interpolated with the general model. In our study, we replicate parts of the Boulis and Ostendorf study. We then use a chi-square test to define a different measure of topicality than the one used by Boulis and Ostendorf, and then extend this measure to produce a nonlinear interpolation of the same language models. While the results of each of our experiments are consistent with the conclusions of the Boulis and Ostendorf study, avenues for exploring possible further interpolation techniques remain open.
GE Diversity Fund
Items in Knowledge Bank are protected by copyright, with all rights reserved, unless otherwise indicated.