Interpretable Sparse Proximate Factors for Large Dimensions

81 Pages Posted: 21 May 2018 Last revised: 8 Jul 2021

See all articles by Markus Pelger

Markus Pelger

Stanford University - Department of Management Science & Engineering

Ruoxuan Xiong

Emory University

Date Written: July 31, 2020


This paper proposes sparse and easy-to-interpret proximate factors to approximate statistical latent factors. Latent factors in a large-dimensional factor model can be estimated by principal component analysis (PCA), but are usually hard to interpret. We obtain proximate factors that are easier to interpret by shrinking the PCA factor weights and setting them to zero except for the largest absolute ones. We show that proximate factors constructed with only 5-10% of the data are usually sufficient to almost perfectly replicate the population and PCA factors without actually assuming a sparse structure in the weights or loadings. Using extreme value theory we explain why sparse proximate factors can be substitutes for non-sparse PCA factors. We derive analytical asymptotic bounds for the correlation of appropriately rotated proximate factors with the population factors. These bounds provide guidance on how to construct the proximate factors. In simulations and empirical analyses of financial portfolio and macroeconomic data we illustrate that sparse proximate factors are close substitutes for PCA factors with average correlations of around 97.5% while being interpretable.

Keywords: Factor Analysis, Principle Components, Sparse Loading, Interpretability, Large-Dimensional Panel Data, Large N and T

JEL Classification: C14, C38, C55, G12

Suggested Citation

Pelger, Markus and Xiong, Ruoxuan, Interpretable Sparse Proximate Factors for Large Dimensions (July 31, 2020). Available at SSRN: or

Markus Pelger (Contact Author)

Stanford University - Department of Management Science & Engineering ( email )

473 Via Ortega
Stanford, CA 94305-9025
United States

Ruoxuan Xiong

Emory University ( email )

36 Eagle Row
Atlanta, GA 30322-0001
United States
4707273668 (Phone)

Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
PlumX Metrics