Testing the Statistical Significance of an Ultra-High-Dimensional Naive Bayes Classifier
8 Pages Posted: 12 Apr 2012
Date Written: April 12, 2012
The naive Bayes approach is one of the most popular methods used for classification. Nevertheless, how to test its statistical significance under an ultra-high-dimensional (UHD) setup is not well understood. To fill this important theoretical gap, we propose a novel testing statistic with a standard normal asymptotic null distribution, even if the predictor dimension is considerably larger than the sample size. This makes the proposed method useful for UHD data analysis. Simulation studies are presented to demonstrate its finite sample performance and a text classification example is described for illustration.
Keywords: Binary Predictor, Hypotheses Testing, Naive Bayes, Supervised Learning, Text Classification, Ultra-High-Dimensional Data
JEL Classification: C30
Suggested Citation: Suggested Citation