Efficiency and Consistency for Regularization Parameter Selection in Penalized Regression: Asymptotics and Finite-Sample Corrections
39 Pages Posted: 10 Sep 2013
Date Written: November 1, 2011
This paper studies the asymptotic and nite-sample performance of penalized regression methods when different selectors of the regularization parameter are used under the assumption that the true model is, or is not, included among the candidate model. In the latter setting, we relax assumptions in the existing theory to show that several classical information criteria are asymptotically efficient selectors of the regularization parameter. In both settings, we assess the nite-sample performance of these as well as other common selectors and demonstrate that their performance can suffer due to sensitivity to the number of variables that are included in the full model. As alternatives, we propose two corrected information criteria which are shown to outperform the existing procedures while still maintaining the desired asymptotic properties. In the non-true model world, we relax the assumption made in the literature that the true error variance is known or that a consistent estimator is available to prove that Akaike's information criterion (AIC), Cp and Generalized cross-validation (GCV) themselves are asymptotically efficient selectors of the regularization parameter and we study their performance in nite samples. In classical regression, AIC tends to select overly complex models when the dimension of the maximum candidate model is large relative to the sample size. Simulation studies suggest that AIC suffers from the same shortcomings when used in penalized regression. We therefore propose the use of the classical AICc as an alternative. In the true model world, a similar investigation into the nite sample properties of BIC reveals analogous overfitting tendencies and leads us to further propose the use of a corrected BIC (BICc). In their respective settings (whether the true model is, or is not, among the candidate models), BICc and AICc have the desired asymptotic properties and we use simulations to assess their performance, as well as that of other selectors, in nite samples for penalized regressions fit using the Smoothly clipped absolute deviation (SCAD) and Least absolute shrinkage and selection operator (Lasso) penalty functions. We nd that AICc and 10-fold cross-validation outperform the other selectors in terms of squared error loss, and BICc avoids the tendency of BIC to select overly complex models when the dimension of the maximum candidate model is large relative to the sample size.
Keywords: Akaike information criterion, Bayesian information criterion, Least ab- solute shrinkage and selection operator, Model selection/ Variable Selection, Penalized regression, Smoothly clipped absolute deviation
Suggested Citation: Suggested Citation