Machine Learning as a Tool for Hypothesis Generation

124 Pages Posted: 14 Mar 2023

See all articles by Jens Ludwig

Jens Ludwig

Georgetown University - Public Policy Institute (GPPI); National Bureau of Economic Research (NBER); IZA Institute of Labor Economics

Sendhil Mullainathan

University of Chicago; National Bureau of Economic Research (NBER)

Multiple version iconThere are 2 versions of this paper

Date Written: March 14, 2023

Abstract

While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a procedure that uses machine learning algorithms—and their capacity to notice patterns people might not—to generate novel hypotheses about human behavior. We illustrate the procedure with a concrete application: judge decisions. We begin with a striking fact: up to half of the predictable variation in who judges jail is explained solely by the pixels in the defendant’s mugshot—that is, the predictions from an algorithm built using just facial images. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: They are not explained by factors implied by existing research (demographics, facial features emphasized by previous psychology studies), nor are they already known (even if just tacitly) to people or even experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional dataset (e.g. cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our paper is that hypothesis generation is in and of itself a valuable activity, and hope this encourages future work in this largely “pre-scientific” stage of science.

Keywords: Hypothesis generation, algorithms, artificial intelligence

JEL Classification: B4, C1

Suggested Citation

Ludwig, Jens and Mullainathan, Sendhil, Machine Learning as a Tool for Hypothesis Generation (March 14, 2023). University of Chicago, Becker Friedman Institute for Economics Working Paper No. 2023-28, Available at SSRN: https://ssrn.com/abstract=4387832 or http://dx.doi.org/10.2139/ssrn.4387832

Jens Ludwig (Contact Author)

Georgetown University - Public Policy Institute (GPPI) ( email )

3600 N Street, NW Suite 200
Washington, DC 20057
United States

National Bureau of Economic Research (NBER)

1050 Massachusetts Avenue
Cambridge, MA 02138
United States

IZA Institute of Labor Economics

P.O. Box 7240
Bonn, D-53072
Germany

Sendhil Mullainathan

University of Chicago ( email )

1101 East 58th Street
Chicago, IL 60637
United States

National Bureau of Economic Research (NBER) ( email )

1050 Massachusetts Avenue
Cambridge, MA 02138
United States
617-588-1473 (Phone)
617-876-2742 (Fax)

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
62
Abstract Views
188
Rank
493,777
PlumX Metrics