Ridge regression under dense factor augmented models
81 Pages Posted: 13 Nov 2020 Last revised: 2 Aug 2021
Date Written: July 31, 2021
This paper establishes a comprehensive theory of the optimality, robustness, and cross-validation selection-consistency for the ridge regression under factor-augmented models with possibly dense idiosyncratic information. Using spectral analysis for random matrices, we show that the ridge regression is asymptotically efficient in capturing both factor and idiosyncratic information, by minimizing the limiting predictive loss among the entire class of spectral regularized estimators under large-dimensional factor models and mixed-effects hypothesis. We derive an asymptotically optimal ridge penalty in closed form, and prove that a bias-corrected k-fold cross-validation procedure can select the best ridge penalty adaptively in large samples. We extend the theory to the autoregressive models with many exogenous variables and establish a consistent cross-validation procedure using the what-we-called double ridge regression method. Our results allow for non-parametric distributions for martingale difference errors and idiosyncratic random coefficients, and adapt to both the cross-sectional and temporal dependence structures of the large-dimensional predictors. A simulation study shows the efficiency of the ridge regression estimators under dense models and its robustness against sparsity. We apply our methods to forecast the growth rate of US industrial production and S\&P 500 index using the FRED-MD database. Our double ridge regression improves the autoregressive prediction and outperforms the principal component regression and the factor-adjusted LASSO regression consistently.
Keywords: High-dimensional linear model; random matrix theory; Tikhonov regularization; mixed-effects model; cross-validation
JEL Classification: C51, C55, C22
Suggested Citation: Suggested Citation