Testing the Statistical Significance of an Ultra-High-Dimensional Naive Bayes Classifier

8 Pages Posted: 12 Apr 2012

See all articles by Baiguo An

Baiguo An

Capital University of Economics and Business

Hansheng Wang

Peking University - Guanghua School of Management

Jianhua Guo

Northeast Normal University

Date Written: April 12, 2012

Abstract

The naive Bayes approach is one of the most popular methods used for classification. Nevertheless, how to test its statistical significance under an ultra-high-dimensional (UHD) setup is not well understood. To fill this important theoretical gap, we propose a novel testing statistic with a standard normal asymptotic null distribution, even if the predictor dimension is considerably larger than the sample size. This makes the proposed method useful for UHD data analysis. Simulation studies are presented to demonstrate its finite sample performance and a text classification example is described for illustration.

Keywords: Binary Predictor, Hypotheses Testing, Naive Bayes, Supervised Learning, Text Classification, Ultra-High-Dimensional Data

JEL Classification: C30

Suggested Citation

An, Baiguo and Wang, Hansheng and Guo, Jianhua, Testing the Statistical Significance of an Ultra-High-Dimensional Naive Bayes Classifier (April 12, 2012). Available at SSRN: https://ssrn.com/abstract=2039110 or http://dx.doi.org/10.2139/ssrn.2039110

Baiguo An

Capital University of Economics and Business ( email )

Capital University of Economics and Business
Beijing, Beijing
China

Hansheng Wang (Contact Author)

Peking University - Guanghua School of Management ( email )

Peking University
Beijing, Beijing 100871
China

HOME PAGE: http://hansheng.gsm.pku.edu.cn

Jianhua Guo

Northeast Normal University ( email )

Changchun
China

Register to save articles to
your library

Register

Paper statistics

Downloads
205
rank
144,955
Abstract Views
1,148
PlumX Metrics