Multilingual Animacy Classification by Sparse Logistic Regression
Loading...
Date
2010
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Ohio State University. Department of Linguistics
Abstract
This paper presents results from three experiments on automatic animacy classification in Japanese and English. We present experiments that focus on solutions to the problem of reliably classifying a large set of infrequent items using a small number of automatically extracted features. We labeled a set of Japanese nouns as ±animate on the basis of reliable, surface-obvious morphological features, producing an accurately but sparsely labeled data set. To classify these nouns, and to achieve good generalization to other nouns for which we do not have labels, we used feature vectors based on frequency counts of verbargument relations that abstract away from item identity and into class-wide distributional tendencies of the feature set. Grouping items into suffix-based equivalence classes prior to classification increased data coverage and improved classification accuracy. For the items that occur at least once with our feature set, we obtained 95% classification accuracy. We used loanwords to transfer automatically acquired labels from English to classify items that are zerofrequency in the Japanese data set, giving increased precision on inanimate items and increased recall on animate items.
Description
Keywords
Citation
Working Papers in Linguistics, no. 59 (2010), 52-74.