Unstructured Data, Econometric Models, and Estimation Bias
37 Pages Posted: 20 May 2022 Last revised: 12 Sep 2023
Date Written: May 26, 2022
Abstract
This article examines the powerful combination of machine learning and econometric models to study unstructured data. Researchers estimate an econometric model (e.g., discrete choice) that relates an outcome of interest (e.g., consumer purchases) to a focal feature in unstructured data (e.g., pet presence in product images), with the feature extracted by machine learning. We examine potential bias in the estimate of econometric model due to extraction errors by machine learning. Extraction errors are not white noises but functions of unstructured data. Consequently, the mechanisms and directions of bias are different from those in measurement errors. We derive general approaches to alleviate the bias as compared to an “oracle” who directly knows the focal feature. The approaches extend and improve the few pioneering works in this area, by: (i) covering general nonlinear econometric models, and (ii) removing the restrictive assumption that non-focal features of unstructured data have no effects on outcome of interest.
Keywords: unstructured data, econometric models, estimation bias, machine learning, extraction error, measurement error
Suggested Citation: Suggested Citation