An Investigation of Missing Data Methods for Classiffcation Trees

43 Pages Posted: 3 Nov 2008

See all articles by Yufeng Ding

Yufeng Ding

affiliation not provided to SSRN

Jeffrey S. Simonoff

New York University (NYU) - Leonard N. Stern School of Business; New York University (NYU) - Department of Information, Operations, and Management Sciences

Date Written: December 2006

Abstract

There are many different missing data methods used by classification tree algorithms, but few studies have been done comparing their appropriateness and performance. This paper provides both analytic and Monte Carlo evidence regarding the effectiveness of six popular missing data methods for classification trees. We show that in the context of classification trees, the relationship between the missingness and the dependent variable, rather than the standard missingness classification approach ofLittle and Rubin (2002) (missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR)), is the mosthelpful criterion to distinguish different missing data methods. We make recommendations as to the best method to use in various situations. The paper concludes with discussion of a real data set related to predicting bankruptcy of a firm.

Keywords: C4.5, CART, Classification tree, Separate class

Suggested Citation

Ding, Yufeng and Simonoff, Jeffrey S., An Investigation of Missing Data Methods for Classiffcation Trees (December 2006). Statistics Working Papers Series, Vol. , pp. -, 2006. Available at SSRN: https://ssrn.com/abstract=1293152

Yufeng Ding (Contact Author)

affiliation not provided to SSRN

No Address Available

Jeffrey S. Simonoff

New York University (NYU) - Leonard N. Stern School of Business ( email )

44 West 4th Street
Suite 9-160
New York, NY NY 10012
United States

New York University (NYU) - Department of Information, Operations, and Management Sciences

44 West Fourth Street
New York, NY 10012
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
47
Abstract Views
498
PlumX Metrics