Credible Prediction: Big Data, Machine Learning and the Credibility Revolution

Forthcoming in Law as Data: Computation and the Future of Legal Analysis (SFI Press)

36 Pages Posted: 24 Apr 2018 Last revised: 30 Jul 2018

See all articles by Ryan Copus

Ryan Copus

University of Missouri at Kansas City - School of Law

Ryan Hubert

University of California, Davis - Department of Political Science

Hannah Laqueur

University of California, Davis

Date Written: April 1, 2018

Abstract

This essay addresses the place of machine learning in a post "credibility revolution'" landscape. We begin with an overview of machine learning. Then, we make four main points. First, design still trumps analysis. The lessons of the credibility revolution should not be forgotten in the excitement around machine learning: machine learning does nothing to address the problem of omitted variable bias. Nonetheless, machine learning can improve a researcher's data analysis. Indeed, with growing concerns about the reliability of even design-based research, perhaps we should be aiming for triangulation rather than design purism. Further, for some questions we do not have the luxury of waiting for a strong design, and we need a best approximation of answer in the meantime. Second, even design-committed researchers should not ignore machine learning: it can be used in service of design-based studies to make causal estimates less variable, less biased, and more heterogeneous. Third, there are important policy-relevant prediction problems for which machine learning is particularly valuable (e.g., predicting recidivism in the criminal justice system). Yet even with research questions centered around prediction, a focus on design is still essential. As with causal inference, researchers cannot simply rely on statistical models, but must also carefully consider threats to the validity of predictions. We briefly review some of these threats: GIGO ("garbage-in garbage out"), selective labels, and Campbell's law. Fourth, the predictive power of machine learning can be leveraged for descriptive research. Where possible, we illustrate these points using examples drawn from real-world research.

Suggested Citation

Copus, Ryan and Hubert, Ryan and Laqueur, Hannah, Credible Prediction: Big Data, Machine Learning and the Credibility Revolution (April 1, 2018). Forthcoming in Law as Data: Computation and the Future of Legal Analysis (SFI Press), Available at SSRN: https://ssrn.com/abstract=3156795

Ryan Copus

University of Missouri at Kansas City - School of Law ( email )

5100 Rockhill Road
Kansas City, MO 64110-2499
United States

Ryan Hubert (Contact Author)

University of California, Davis - Department of Political Science ( email )

One Shields Avenue
Davis, CA 95616
United States

HOME PAGE: http://www.ryanhubert.com

Hannah Laqueur

University of California, Davis ( email )

One Shields Avenue
Apt 153
Davis, CA 95616
United States
9173642301 (Phone)

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
267
Abstract Views
1,138
Rank
196,968
PlumX Metrics