Credible Prediction: Big Data, Machine Learning and the Credibility Revolution
Forthcoming in Law as Data: Computation and the Future of Legal Analysis (SFI Press)
36 Pages Posted: 24 Apr 2018 Last revised: 30 Jul 2018
Date Written: April 1, 2018
This essay addresses the place of machine learning in a post "credibility revolution'" landscape. We begin with an overview of machine learning. Then, we make four main points. First, design still trumps analysis. The lessons of the credibility revolution should not be forgotten in the excitement around machine learning: machine learning does nothing to address the problem of omitted variable bias. Nonetheless, machine learning can improve a researcher's data analysis. Indeed, with growing concerns about the reliability of even design-based research, perhaps we should be aiming for triangulation rather than design purism. Further, for some questions we do not have the luxury of waiting for a strong design, and we need a best approximation of answer in the meantime. Second, even design-committed researchers should not ignore machine learning: it can be used in service of design-based studies to make causal estimates less variable, less biased, and more heterogeneous. Third, there are important policy-relevant prediction problems for which machine learning is particularly valuable (e.g., predicting recidivism in the criminal justice system). Yet even with research questions centered around prediction, a focus on design is still essential. As with causal inference, researchers cannot simply rely on statistical models, but must also carefully consider threats to the validity of predictions. We briefly review some of these threats: GIGO ("garbage-in garbage out"), selective labels, and Campbell's law. Fourth, the predictive power of machine learning can be leveraged for descriptive research. Where possible, we illustrate these points using examples drawn from real-world research.
Suggested Citation: Suggested Citation