Supplement to 'Efficiency and Consistency for Regularization Parameter Selection in Penalized Regression: Asymptotics and Finite-Sample Corrections'
13 Pages Posted: 19 Nov 2011
Date Written: November 2, 2011
This paper studies the asymptotic and nite-sample performance ofpenalized regression methods when different selectors of theregularization parameter are used under the assumption that the truemodel is, or is not, included among the candidate model. In the lattersetting, we relax assumptions in the existing theory to show thatseveral classical information criteria are asymptotically efficientselectors of the regularization parameter. In both settings, we assessthe nite-sample performance of these as well as other common selectorsand demonstrate that their performance can suffer due to sensitivity tothe number of variables that are included in the full model. Asalternatives, we propose two corrected information criteria which areshown to outperform the existing procedures while still maintaining thedesired asymptotic properties. In the non-true model world, we relaxthe assumption made in the literature that the true error variance isknown or that a consistent estimator is available to prove that Akaike'sinformation criterion (AIC), Cp and Generalized cross-validation (GCV)themselves are asymptotically efficient selectors of the regularizationparameter and we study their performance in nite samples. In classicalregression, AIC tends to select overly complex models when the dimensionof the maximum candidate model is large relative to the sample size.Simulation studies suggest that AIC suffers from the same shortcomingswhen used in penalized regression. We therefore propose the use of theclassical AICc as an alternative. In the true model world, a similarinvestigation into the nite sample properties of BIC reveals analogousoverfitting tendencies and leads us to further propose the use of acorrected BIC (BICc). In their respective settings (whether the truemodel is, or is not, among the candidate models), BICc and AICc have thedesired asymptotic properties and we use simulations to assess theirperformance, as well as that of other selectors, in nite samples forpenalized regressions fit using the Smoothly clipped absolute deviation(SCAD) and Least absolute shrinkage and selection operator (Lasso)penalty functions. We nd that AICc and 10-fold cross-validationoutperform the other selectors in terms of squared error loss, and BICcavoids the tendency of BIC to select overly complex models when thedimension of the maximum candidate model is large relative to the sample size.
Keywords: Akaike information criterion; Bayesian information criterion; Least ab-solute shrinkage and selection operator; Model selection/ VariableSelection; Penalized regression; Smoothly clipped absolute deviation.
Suggested Citation: Suggested Citation