An Information-Theoretic Approach to Dimension Reduction of Financial Data
13 Pages Posted: 4 May 2013 Last revised: 2 Jun 2020
Date Written: June 3, 2013
The task of statistically analysing and understanding high-dimensional financial data sets is one that is increasingly pertinent in an age of burgeoning information. With high frequency measurements and a global investment universe of hundreds of thousands of securities, reducing the dimension of large data sets by projecting them onto a smaller set of dominant underlying factors or components is often a first step. While principal component analysis has been a standard dimension reduction tool for many decades, a theoretically sound measure of the number of components that should be retained has been lacking. Here we show that the effective rank offers a potential model-independent solution to the problem. We demonstrate that the explanatory power of the number of components indicated by the effective rank is remarkably stable for a wide range of global financial market data while the effective rank itself can vary dramatically over time, offering a potential indicator of systemic risk. The results suggest a certain universality to the measure and we provide some theoretical results supporting this view, derive lower bounds for its explanatory power and highlight links to measures of diversification in areas ranging from ecology to quantum mechanics. Our results demonstrate that the time-varying drivers of financial markets do exhibit some persistent structure. We anticipate our results will prompt further investigation of the effective rank in principal component analysis given the latters’ wide appeal in diverse fields of research ranging from psychology to atmospheric science. We also hope our results provide some direction to solving related dimensional problems such as in cluster analysis where the longstanding question of how many clusters should be used remains unanswered.
Keywords: Systemic risk, principal component, entropy, effective number, effective support, information theory
JEL Classification: C10, C40, C49, E66, G00
Suggested Citation: Suggested Citation