An Investigation of Language Model Interpretability via Sentence Editing

Loading...
Thumbnail Image

Date

2021-05

Journal Title

Journal ISSN

Volume Title

Publisher

The Ohio State University

Research Projects

Organizational Units

Journal Issue

Abstract

Pre-trained language models (PLMs) like BERT are being used for almost all language-related tasks, but interpreting their behavior still remains a significant challenge and many important questions remain largely unanswered. For example, how does domain-specific pre-training change the dynamics within a model? Is task-specific fine-tuning necessary for model interpretability? Which interpretability techniques best correlate with human rationales? In this work, we re-purpose a sentence editing dataset, where high-quality human rationales can be automatically extracted and compared with model rationales, as a new testbed for interpretability. This enables us to conduct a systematic investigation of the aforementioned open questions regarding PLMs' interpretability and generate new insights. The dataset and code will be released to facilitate future research on interpretability.

Description

Keywords

natural language processing, interpretability, pre-trained language models, faithful human rationales

Citation