Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured Information

42 Pages Posted: 4 Jan 2019

See all articles by Lin William Cong

Lin William Cong

University of Chicago - Booth School of Business

Tengyuan Liang

University of Chicago Booth School of Business

Xiao Zhang

University of Chicago - Booth School of Business

Date Written: December 27, 2018

Abstract

Modern firms leverage on big, unstructured data, in particular texts, for originating loans, predicting asset returns, improving customer service, etc. Moreover, interpretable textual information sheds light on key economic mechanisms and explanatory variables. We therefore develop a general framework for analyzing large-scale text-based data, combining the strengths of neural network language models such as word embedding and generative statistical modeling such as topic modeling. Our data-driven approach captures complex linguistic structures while ensuring computational scalability and economic interpretability. We also discuss applications of our methodology to issues in finance and economics, such as forecasting or backfilling asset returns or macroeconomic outcomes, interpreting existing models, and creating new domain knowledge to expand the frontier of analysis.

Keywords: Big Data, Factor Models, Machine Learning, Return Predictability, Text-based Analysis, Topic Models, Unstructured Data.

JEL Classification: C55, C80, G10

Suggested Citation

Cong, Lin and Liang, Tengyuan and Zhang, Xiao, Textual Factors: A Scalable, Interpretable, and Data-driven Approach to Analyzing Unstructured Information (December 27, 2018). Available at SSRN: https://ssrn.com/abstract=3307057 or http://dx.doi.org/10.2139/ssrn.3307057

Lin Cong (Contact Author)

University of Chicago - Booth School of Business ( email )

5807 S. Woodlawn Avenue
Chicago, IL 60637
United States

Tengyuan Liang

University of Chicago Booth School of Business ( email )

Xiao Zhang

University of Chicago - Booth School of Business ( email )

5807 S. Woodlawn Avenue
Chicago, IL 60637
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
144
rank
193,560
Abstract Views
493
PlumX Metrics