Risk Factors That Matter: Textual Analysis of Risk Disclosures for the Cross-Section of Returns
89 Pages Posted: 20 Jan 2019 Last revised: 3 Dec 2019
Date Written: November 2019
I exploit unsupervised machine learning and natural language processing techniques to elicit the risk factors that firms themselves identify in their annual reports. I quantify the firms' exposure to each identified risk, design an econometric test to classify them as either systematic or idiosyncratic, and construct factor mimicking portfolios that proxy for each undiversifiable source of risk. The portfolios are priced in the cross-section and contain information above and beyond the commonly used multi-factor representations. A model that uses only firm identified risk factors (FIRFs) performs at least as well as traditional factor models, despite not using any information from past prices or returns.
Keywords: Cross-Section of Returns, Factor Models, Machine Learning, Big Data, LDA, Text Analysis, NLP
Suggested Citation: Suggested Citation