Multilingual Animacy Classification by Sparse Logistic Regression
MetadataShow full item record
Publisher:Ohio State University. Department of Linguistics
Citation:Working Papers in Linguistics, no. 59 (2010), 52-74.
This paper presents results from three experiments on automatic animacy classification in Japanese and English. We present experiments that focus on solutions to the problem of reliably classifying a large set of infrequent items using a small number of automatically extracted features. We labeled a set of Japanese nouns as ±animate on the basis of reliable, surface-obvious morphological features, producing an accurately but sparsely labeled data set. To classify these nouns, and to achieve good generalization to other nouns for which we do not have labels, we used feature vectors based on frequency counts of verbargument relations that abstract away from item identity and into class-wide distributional tendencies of the feature set. Grouping items into suffix-based equivalence classes prior to classification increased data coverage and improved classification accuracy. For the items that occur at least once with our feature set, we obtained 95% classification accuracy. We used loanwords to transfer automatically acquired labels from English to classify items that are zerofrequency in the Japanese data set, giving increased precision on inanimate items and increased recall on animate items.
Rights:This object is protected by copyright, and is made available here for research and educational purposes. Permission to reuse, publish, or reproduce the object beyond the bounds of Fair Use or other exemptions to copyright law must be obtained from the copyright holder.
Items in Knowledge Bank are protected by copyright, with all rights reserved, unless otherwise indicated.