A Comparison of Methods in Political Science Text Classification: Transfer Learning Language Models for Politics

25 Pages Posted: 14 Jan 2021

See all articles by Zhanna Terechshenko

Zhanna Terechshenko

New York University (NYU)

Fridolin Linder

Pennsylvania State University

Vishakh Padmakumar

New York University (NYU)

Michael Liu

New York University Abu Dhabi

Jonathan Nagler

NYU - Wilf Family Department of Politics

Joshua A. Tucker

New York University (NYU)

Richard Bonneau

New York University (NYU) - New York University

Date Written: October 20, 2020

Abstract

Automated text classification has rapidly become an important tool for political analysis.Recent advancements in NLP enabled by advances in deep learning now achieve state of the art results in many standard tasks for the field. However, these methods require large amounts of both computing power and text data to learn the characteristics of the language, resources which are not always accessible to political scientists. One solution is a transfer learning approach, where knowledge learned in one area or source task is transferred to another area or a target task. A class of models that embody this approach are language models, which demonstrate extremely high levels of performance. We investigate the performance of these models in the political science by comparing multiple text classification methods. We find RoBERTa and XLNet, language models that rely on theTransformer, require fewer computing resources and less training data to perform on par with – or outperform – several political science text classification methods. Moreover, we find that the increase in accuracy is especially significant in the case of small labeled data, highlighting the potential for reducing the data-labeling cost of supervised methods for political scientists via the use of pretrained language models.

Keywords: text classification; transfer learning; language models; transformers

Suggested Citation

Terechshenko, Zhanna and Linder, Fridolin and Padmakumar, Vishakh and Liu, Fengyuan and Nagler, Jonathan and Tucker, Joshua Aaron and Bonneau, Richard, A Comparison of Methods in Political Science Text Classification: Transfer Learning Language Models for Politics (October 20, 2020). Available at SSRN: https://ssrn.com/abstract=3724644 or http://dx.doi.org/10.2139/ssrn.3724644

Zhanna Terechshenko (Contact Author)

New York University (NYU) ( email )

Bobst Library, E-resource Acquisitions
20 Cooper Square 3rd Floor
New York, NY 10003-711
United States

Fridolin Linder

Pennsylvania State University

University Park
State College, PA 16802
United States

Vishakh Padmakumar

New York University (NYU) ( email )

Bobst Library, E-resource Acquisitions
20 Cooper Square 3rd Floor
New York, NY 10003-711
United States

Fengyuan Liu

New York University Abu Dhabi ( email )

PO Box 129188
Abu Dhabi
United Arab Emirates

Jonathan Nagler

NYU - Wilf Family Department of Politics ( email )

Dept of Politics - 2nd floor
19 W. 4th Street
New York, NY 10012
United States

Joshua Aaron Tucker

New York University (NYU) ( email )

Bobst Library, E-resource Acquisitions
20 Cooper Square 3rd Floor
New York, NY 10003-711
United States

Richard Bonneau

New York University (NYU) - New York University ( email )

Bobst Library, E-resource Acquisitions
20 Cooper Square 3rd Floor
New York, NY 10003-711
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
1,415
Abstract Views
3,643
Rank
28,813
PlumX Metrics