Handling Missing Values in Information Systems Research: A Review of Methods and Assumptions
Peng, J., Hahn, J. and Huang, K.-W. (forthcoming) “Handling Missing Values in Information Systems Research: A Review of Methods and Assumptions,” Information Systems Research
37 Pages Posted: 14 Apr 2020 Last revised: 2 Mar 2022
Date Written: January 28, 2020
Abstract
In today’s big data environment, missing values continues to be a problem that harms the data quality. The bias caused by missing values raises the highest concern as it cannot be eliminated simply by increasing the sample size. Although the statistics literature has developed approaches to handling missing values and formulated assumptions regarding when these approaches generate valid statistical inferences, these prescriptions have yet to be broadly accepted by many social science disciplines including the Information Systems (IS) discipline. By reviewing recently published empirical research in information systems, we find that missing values is indeed an important and pervasive problem. We believe that a review of missing value theory is necessary for the IS community to understand the nature of missing values and to promote more rigorous research practice when missing values is often unavoidable. In addition, the not missing at random (NMAR) mechanism brings in challenges in parameter estimation. We contribute to research practice by proposing and demonstrating the superior performance of a Monte Carlo likelihood approach in correcting bias in parameter estimation. We conclude by suggesting that research validity can be enhanced through reasoned adoption of missing value handling method and missing value reporting practice.
Keywords: Missing Values, Econometrics, Missingness Mechanism, Not Missing at Random, Data Quality, Statistical Inference
Suggested Citation: Suggested Citation