Credible Prediction: Big Data, Machine Learning and the Credibility Revolution

Forthcoming in Law as Data: Computation and the Future of Legal Analysis (SFI Press)

36 Pages Posted: 24 Apr 2018 Last revised: 30 Jul 2018

See all articles by Ryan Copus

Ryan Copus

Harvard Law School

Ryan Hubert

University of California, Davis - Department of Political Science

Hannah Laqueur

University of California, Davis

Date Written: April 1, 2018

Abstract

This essay addresses the place of machine learning in a post "credibility revolution'" landscape. We begin with an overview of machine learning. Then, we make four main points. First, design still trumps analysis. The lessons of the credibility revolution should not be forgotten in the excitement around machine learning: machine learning does nothing to address the problem of omitted variable bias. Nonetheless, machine learning can improve a researcher's data analysis. Indeed, with growing concerns about the reliability of even design-based research, perhaps we should be aiming for triangulation rather than design purism. Further, for some questions we do not have the luxury of waiting for a strong design, and we need a best approximation of answer in the meantime. Second, even design-committed researchers should not ignore machine learning: it can be used in service of design-based studies to make causal estimates less variable, less biased, and more heterogeneous. Third, there are important policy-relevant prediction problems for which machine learning is particularly valuable (e.g., predicting recidivism in the criminal justice system). Yet even with research questions centered around prediction, a focus on design is still essential. As with causal inference, researchers cannot simply rely on statistical models, but must also carefully consider threats to the validity of predictions. We briefly review some of these threats: GIGO ("garbage-in garbage out"), selective labels, and Campbell's law. Fourth, the predictive power of machine learning can be leveraged for descriptive research. Where possible, we illustrate these points using examples drawn from real-world research.

Suggested Citation

Copus, Ryan and Hubert, Ryan and Laqueur, Hannah, Credible Prediction: Big Data, Machine Learning and the Credibility Revolution (April 1, 2018). Forthcoming in Law as Data: Computation and the Future of Legal Analysis (SFI Press). Available at SSRN: https://ssrn.com/abstract=3156795

Ryan Copus

Harvard Law School ( email )

1575 Massachusetts
Hauser 406
Cambridge, MA 02138
United States

Ryan Hubert (Contact Author)

University of California, Davis - Department of Political Science ( email )

One Shields Avenue
Davis, CA 95616
United States

HOME PAGE: http://www.ryanhubert.com

Hannah Laqueur

University of California, Davis ( email )

One Shields Avenue
Davis, CA 95616
United States
9173642301 (Phone)

Register to save articles to
your library

Register

Paper statistics

Downloads
75
rank
308,046
Abstract Views
314
PlumX Metrics