Ensemble Subsampling for Imbalanced Multivariate Two-Sample Tests
Journal of the American Statistical Association, 2013
40 Pages Posted: 20 Jun 2013 Last revised: 23 Oct 2019
Date Written: August 5, 2013
Abstract
Some existing nonparametric two-sample tests for equality of multivariate distributions perform unsatisfactorily when the two sample sizes are unbalanced. In particular, the power of these tests tends to diminish with increasingly unbalanced sample sizes. In this paper, we propose a new testing procedure to solve this problem. The proposed test, based on a nearest neighbor method by Schilling (1986a), employs a novel ensemble subsampling scheme to remedy this issue. More specifically, the test statistic is a weighted average of a collection of statistics, each associated with a randomly selected subsample of the data. We derive the asymptotic distribution of the test statistic under the null hypothesis and show that the new test is consistent against all alternatives when the ratio of the sample sizes either goes to a finite limit or tends to infinity. Via simulated data examples we demonstrate that the new test has increasing power with increasing sample size ratio when the size of the smaller sample is fixed. The test is applied to a real data example in the field of Corporate Finance.
Keywords: Corporate Finance, ensemble methods, imbalanced learning, Kolmogorov-Smirnov test, nearest neighbors methods, subsampling methods, multivariate two-sample tests
JEL Classification: C10, C40, C52, G32, G35
Suggested Citation: Suggested Citation