Feature Screening for Ultrahigh Dimensional Categorical Data with Applications

21 Pages Posted: 15 Jan 2014

See all articles by Danyang Huang

Danyang Huang

Peking University

Runze Li

Pennsylvania State University

Hansheng Wang

Peking University - Guanghua School of Management

Date Written: October 31, 2013

Abstract

Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screening consistency property in the terminology of Fan and Lv (2008). We investigate the finite sample performance of the proposed procedure by Monte Carlo simulation studies, and illustrate the proposed method by two empirical datasets.

Keywords: Feature Screening; Pearson’s Chi-Square Test; Screening Consistency; Search Engine Marketing; Text Classification; Ultrahigh Dimensional Data

JEL Classification: C10, C12, C13

Suggested Citation

Huang, Danyang and Li, Runze and Wang, Hansheng, Feature Screening for Ultrahigh Dimensional Categorical Data with Applications (October 31, 2013). Available at SSRN: https://ssrn.com/abstract=2378670 or http://dx.doi.org/10.2139/ssrn.2378670

Danyang Huang

Peking University ( email )

No. 38 Xueyuan Road
Haidian District
Beijing, Beijing 100871
China

Runze Li

Pennsylvania State University ( email )

University Park
State College, PA 16802
United States

Hansheng Wang (Contact Author)

Peking University - Guanghua School of Management ( email )

Peking University
Beijing, Beijing 100871
China

HOME PAGE: http://hansheng.gsm.pku.edu.cn

Register to save articles to
your library

Register

Paper statistics

Downloads
271
Abstract Views
1,106
rank
113,232
PlumX Metrics