We Don't Know What We Don't Know: When and How the Use of Twitter's Public APIs Biases Scientific Inference

26 Pages Posted: 4 Dec 2017 Last revised: 13 Apr 2019

See all articles by Rebekah Tromble

Rebekah Tromble

Leiden University - Department of Political Science

Andreas Storz

Leiden University - Institute of Political Science

Daniela Stockmann

Leiden University - Department of Political Science

Date Written: November 29, 2017

Abstract

Though Twitter research has proliferated, no standards for data collection have crystallized. When using keyword queries, the most common data sources—the Search and Streaming APIs—rarely return the full population of tweets, and scholars do not know whether their data constitute a representative sample. This paper seeks to provide the most comprehensive look to-date at the potential biases that may result. Employing data derived from four identical keyword queries to the Firehose (which provides the full population of tweets but is cost-prohibitive), Streaming, and Search APIs, we use Kendall’s-tau and logit regression analyses to understand the differences in the datasets, including what user and content characteristics make a tweet more or less likely to appear in sampled results. We find that there are indeed systematic differences that are likely to bias scholars’ findings in almost all datasets we examine, and we recommend significant caution in future Twitter research.

Keywords: twitter, data collection, APIs, bias

Suggested Citation

Tromble, Rebekah and Storz, Andreas and Stockmann, Daniela, We Don't Know What We Don't Know: When and How the Use of Twitter's Public APIs Biases Scientific Inference (November 29, 2017). Available at SSRN: https://ssrn.com/abstract=3079927 or http://dx.doi.org/10.2139/ssrn.3079927

Rebekah Tromble (Contact Author)

Leiden University - Department of Political Science ( email )

2333 AK Leiden
Netherlands

HOME PAGE: http://www.rebekahtromble.net

Andreas Storz

Leiden University - Institute of Political Science ( email )

Faculty of Social and Behavioural Sciences
PO Box 9555
Leiden, 2300 RB
Netherlands

Daniela Stockmann

Leiden University - Department of Political Science ( email )

2333 AK Leiden
Netherlands
+31 (0)71 527 3867 (Phone)

HOME PAGE: http://www.daniestockmann.net

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
379
Abstract Views
2,678
Rank
165,514
PlumX Metrics