Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured Information
42 Pages Posted: 4 Jan 2019
Date Written: December 27, 2018
Modern firms leverage on big, unstructured data, in particular texts, for originating loans, predicting asset returns, improving customer service, etc. Moreover, interpretable textual information sheds light on key economic mechanisms and explanatory variables. We therefore develop a general framework for analyzing large-scale text-based data, combining the strengths of neural network language models such as word embedding and generative statistical modeling such as topic modeling. Our data-driven approach captures complex linguistic structures while ensuring computational scalability and economic interpretability. We also discuss applications of our methodology to issues in finance and economics, such as forecasting or backfilling asset returns or macroeconomic outcomes, interpreting existing models, and creating new domain knowledge to expand the frontier of analysis.
Keywords: Big Data, Factor Models, Machine Learning, Return Predictability, Text-based Analysis, Topic Models, Unstructured Data.
JEL Classification: C55, C80, G10
Suggested Citation: Suggested Citation