Analyzing Textual Information at Scale
34 Pages Posted: 17 Sep 2019 Last revised: 9 Dec 2019
Date Written: Nov 1, 2019
We overview recent advances in textual analysis for social sciences. Count-based economic model, structured statistical tool, and plain-vanilla machine learning apparatus each has merits and limitations. To take a data-driven approach to capture complex linguistic structures while ensuring computational scalability and economic interpretability, a general framework for analyzing large-scale text-based data is needed. We discuss recent attempts combining the strengths of neural network language models such as word embedding and generative statistical modeling such as topic modeling. We also describe typical sources of texts, the applications of these methodologies to issues in finance and economics, and promising future directions.
Keywords: Big Data, Machine Learning, Text-based Analysis, Topic Models, Unstructured Data, Word Embedding
Suggested Citation: Suggested Citation