Analyzing Textual Information at Scale

34 Pages Posted: 17 Sep 2019 Last revised: 9 Dec 2019

See all articles by Lin William Cong

Lin William Cong

Cornell University

Tengyuan Liang

University of Chicago - Booth School of Business

Baozhong Yang

Georgia State University - Robinson College of Business

Xiao Zhang

University of Chicago - Booth School of Business

Date Written: Nov 1, 2019

Abstract

We overview recent advances in textual analysis for social sciences. Count-based economic model, structured statistical tool, and plain-vanilla machine learning apparatus each has merits and limitations. To take a data-driven approach to capture complex linguistic structures while ensuring computational scalability and economic interpretability, a general framework for analyzing large-scale text-based data is needed. We discuss recent attempts combining the strengths of neural network language models such as word embedding and generative statistical modeling such as topic modeling. We also describe typical sources of texts, the applications of these methodologies to issues in finance and economics, and promising future directions.

Keywords: Big Data, Machine Learning, Text-based Analysis, Topic Models, Unstructured Data, Word Embedding

Suggested Citation

Cong, Lin and Liang, Tengyuan and Yang, Baozhong and Zhang, Xiao, Analyzing Textual Information at Scale (Nov 1, 2019). Available at SSRN: https://ssrn.com/abstract=3449822 or http://dx.doi.org/10.2139/ssrn.3449822

Lin Cong (Contact Author)

Cornell University ( email )

Ithaca, NY 14853
United States

HOME PAGE: http://www.linwilliamcong.org

Tengyuan Liang

University of Chicago - Booth School of Business ( email )

Baozhong Yang

Georgia State University - Robinson College of Business ( email )

35 Broad Street
Atlanta, GA 30303-3083
United States
404-413-7350 (Phone)
404-413-7312 (Fax)

HOME PAGE: http://www2.gsu.edu/~fncbyy/index.html

Xiao Zhang

University of Chicago - Booth School of Business ( email )

5807 S. Woodlawn Avenue
Chicago, IL 60637
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
150
Abstract Views
649
rank
206,876
PlumX Metrics