Endogeneity in Ultrahigh Dimension
53 Pages Posted: 25 Apr 2012
Date Written: April 24, 2012
Most papers on high-dimensional statistics are based on the assumption that none of the regressors are correlated with the regression error, namely, they are exogeneous. Yet, endogeneity arises easily in high-dimensional regression due to a large pool of regressors and this causes the inconsistency of the penalized least-squares methods and possible false scientific discoveries. A necessary condition for model selection of a very general class of penalized regression methods is given, which allows us to prove formally the inconsistency claim. To cope with the possible endogeneity, we construct a novel penalized focussed generalized method of moments (FGMM) criterion function and offer a new optimization algorithm. The FGMM is not a smooth function. To establish its asymptotic properties, we first study the model selection consistency and an oracle property for a general class of penalized regression methods. These results are then used to show that the FGMM possesses an oracle property even in the presence of endogenous predictors, and that the solution is also near global minimum under the over-identification assumption. Finally, we also show how the semi-parametric efficiency of estimation can be achieved via a two-step approach.
Keywords: Focused GMM, Sparsity recovery, Endogenous variables, Oracle property, Conditional moment restriction, Estimating equation, Over identification, Global minimization, Semi-parametric efficiency
JEL Classification: C13, C14
Suggested Citation: Suggested Citation