Ensemble Subsampling for Imbalanced Multivariate Two-Sample Tests

Journal of the American Statistical Association, 2013

40 Pages Posted: 20 Jun 2013 Last revised: 23 Oct 2019

See all articles by Lisha Chen

Lisha Chen

Yale University - Department of Statistics

Winston Dou

The Wharton School, University of Pennsylvania

Zhihua Qiao

Independent

Date Written: August 5, 2013

Abstract

Some existing nonparametric two-sample tests for equality of multivariate distributions perform unsatisfactorily when the two sample sizes are unbalanced. In particular, the power of these tests tends to diminish with increasingly unbalanced sample sizes. In this paper, we propose a new testing procedure to solve this problem. The proposed test, based on a nearest neighbor method by Schilling (1986a), employs a novel ensemble subsampling scheme to remedy this issue. More specifically, the test statistic is a weighted average of a collection of statistics, each associated with a randomly selected subsample of the data. We derive the asymptotic distribution of the test statistic under the null hypothesis and show that the new test is consistent against all alternatives when the ratio of the sample sizes either goes to a finite limit or tends to infinity. Via simulated data examples we demonstrate that the new test has increasing power with increasing sample size ratio when the size of the smaller sample is fixed. The test is applied to a real data example in the field of Corporate Finance.

Keywords: Corporate Finance, ensemble methods, imbalanced learning, Kolmogorov-Smirnov test, nearest neighbors methods, subsampling methods, multivariate two-sample tests

JEL Classification: C10, C40, C52, G32, G35

Suggested Citation

Chen, Lisha and Dou, Winston and Qiao, Zhihua, Ensemble Subsampling for Imbalanced Multivariate Two-Sample Tests (August 5, 2013). Journal of the American Statistical Association, 2013. Available at SSRN: https://ssrn.com/abstract=2282034

Lisha Chen

Yale University - Department of Statistics ( email )

P.O. Box 208290
New Haven, CT 06520
United States

Winston Dou (Contact Author)

The Wharton School, University of Pennsylvania ( email )

2318 Steinberg Hall - Dietrich Hall
3620 Locust Walk
Philadelphia, PA 19104
United States

Zhihua Qiao

Independent ( email )

No Address Available

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
45
Abstract Views
529
PlumX Metrics