Unstructured Data, Econometric Models, and Estimation Bias

37 Pages Posted: 20 May 2022 Last revised: 12 Sep 2023

See all articles by Yanhao 'Max' Wei

Yanhao 'Max' Wei

University of Southern California - Marshall School of Business

Nikhil Malik

Marshall School of Business, USC

Date Written: May 26, 2022

Abstract

This article examines the powerful combination of machine learning and econometric models to study unstructured data. Researchers estimate an econometric model (e.g., discrete choice) that relates an outcome of interest (e.g., consumer purchases) to a focal feature in unstructured data (e.g., pet presence in product images), with the feature extracted by machine learning. We examine potential bias in the estimate of econometric model due to extraction errors by machine learning. Extraction errors are not white noises but functions of unstructured data. Consequently, the mechanisms and directions of bias are different from those in measurement errors. We derive general approaches to alleviate the bias as compared to an “oracle” who directly knows the focal feature. The approaches extend and improve the few pioneering works in this area, by: (i) covering general nonlinear econometric models, and (ii) removing the restrictive assumption that non-focal features of unstructured data have no effects on outcome of interest.

Keywords: unstructured data, econometric models, estimation bias, machine learning, extraction error, measurement error

Suggested Citation

Wei, Yanhao and Malik, Nikhil, Unstructured Data, Econometric Models, and Estimation Bias (May 26, 2022). USC Marshall School of Business Research Paper Sponsored by iORB, Available at SSRN: https://ssrn.com/abstract=4113608 or http://dx.doi.org/10.2139/ssrn.4113608

Yanhao Wei (Contact Author)

University of Southern California - Marshall School of Business ( email )

701 Exposition Blvd
Los Angeles, CA California 90089
United States

Nikhil Malik

Marshall School of Business, USC ( email )

701 Exposition Blvd
Los Angeles, CA California 90089
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
469
Abstract Views
1,648
Rank
130,739
PlumX Metrics