The False Positive Problem of Automatic Bot Detection in Social Science Research

25 Pages Posted: 16 Apr 2020 Last revised: 11 Feb 2021

See all articles by Adrian Rauchfleisch

Adrian Rauchfleisch

National Taiwan University - Graduate Institute of Journalism

Jonas Kaiser

Suffolk University; Harvard University - Berkman Klein Center for Internet & Society

Date Written: March 1, 2020

Abstract

The identification of bots is an important and complicated task. The bot classifier Botometer was successfully introduced as a way to estimate the number of bots in a given list of accounts and, as a consequence, has been frequently used in academic publications. Given its relevance for academic research and our understanding of the presence of automated accounts in any given Twitter discourse, we are interested in Botometer’s diagnostic ability over time. To do so, we collected the Botometer scores for five datasets (three verified as bots, two verified as human; n=4,134) in two languages (English/German) over three months. We show that the Botometer scores are imprecise when it comes to estimating bots; especially in a different language. We further show in an analysis of Botometer scores over time that Botometer’s thresholds, even when used very conservatively, are prone to variance, which, in turn, will lead to false negatives (i.e., bots being classified as humans) and false positives (i.e., humans being classified as bots). This has immediate consequences for academic research as most studies using the tool will unknowingly count a high number of human users as bots and vice versa. We conclude our study with a discussion about how computational social scientists should evaluate machine learning systems that are developed to identify bots.

The paper is now published in PLOS One: https://doi.org/10.1371/journal.pone.0241045

Suggested Citation

Rauchfleisch, Adrian and Kaiser, Jonas and Kaiser, Jonas, The False Positive Problem of Automatic Bot Detection in Social Science Research (March 1, 2020). Berkman Klein Center Research Publication No. 2020-3, Available at SSRN: https://ssrn.com/abstract=3565233 or http://dx.doi.org/10.2139/ssrn.3565233

Adrian Rauchfleisch (Contact Author)

National Taiwan University - Graduate Institute of Journalism ( email )

No.1, Sec.4, Roosevelt Road
Taipei, 10617
Taiwan

Jonas Kaiser

Harvard University - Berkman Klein Center for Internet & Society ( email )

Harvard Law School
23 Everett, 2nd Floor
Cambridge, MA 02138
United States

Suffolk University ( email )

41 Temple Street
Boston, MA 02114
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
1,097
Abstract Views
6,571
Rank
36,839
PlumX Metrics