Multilingual Animacy Classification by Sparse Logistic Regression

Loading...
Thumbnail Image

Date

2010

Journal Title

Journal ISSN

Volume Title

Publisher

Ohio State University. Department of Linguistics

Research Projects

Organizational Units

Journal Issue

Abstract

This paper presents results from three experiments on automatic animacy classification in Japanese and English. We present experiments that focus on solutions to the problem of reliably classifying a large set of infrequent items using a small number of automatically extracted features. We labeled a set of Japanese nouns as ±animate on the basis of reliable, surface-obvious morphological features, producing an accurately but sparsely labeled data set. To classify these nouns, and to achieve good generalization to other nouns for which we do not have labels, we used feature vectors based on frequency counts of verbargument relations that abstract away from item identity and into class-wide distributional tendencies of the feature set. Grouping items into suffix-based equivalence classes prior to classification increased data coverage and improved classification accuracy. For the items that occur at least once with our feature set, we obtained 95% classification accuracy. We used loanwords to transfer automatically acquired labels from English to classify items that are zerofrequency in the Japanese data set, giving increased precision on inanimate items and increased recall on animate items.

Description

Keywords

Citation

Working Papers in Linguistics, no. 59 (2010), 52-74.