Choosing News Topics to Explain Stock Market Returns

ACM International Conference on AI in Finance 2020

8 Pages Posted: 19 Nov 2020 Last revised: 14 Jun 2021

See all articles by Paul Glasserman

Paul Glasserman

Columbia University - Columbia Business School

Kriste Krstovski

Columbia University

Paul-Robert Laliberte

Georgia Institute of Technology

Harry Mamaysky

Columbia University - Columbia Business School

Date Written: October 6, 2020

Abstract

We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling in a stochastic EM algorithm will often overfit returns to the detriment of the topic model. We obtain better out-of-sample performance through a random search of plain LDA models. A branching procedure that reinforces effective topic assignments often performs best. We test these methods on an archive of over 90,000 news articles about S&P 500 firms.

Keywords: text analysis, finance, supervised topic models

JEL Classification: G10, G14

Suggested Citation

Glasserman, Paul and Krstovski, Kriste and Laliberte, Paul-Robert and Mamaysky, Harry, Choosing News Topics to Explain Stock Market Returns (October 6, 2020). ACM International Conference on AI in Finance 2020, Available at SSRN: https://ssrn.com/abstract=3705738

Paul Glasserman

Columbia University - Columbia Business School ( email )

3022 Broadway
403 Uris Hall
New York, NY 10027
United States
212-854-4102 (Phone)
212-316-9180 (Fax)

Kriste Krstovski

Columbia University ( email )

3022 Broadway
New York, NY 10027
United States

Paul-Robert Laliberte

Georgia Institute of Technology ( email )

Atlanta, GA 30332
United States

Harry Mamaysky (Contact Author)

Columbia University - Columbia Business School ( email )

3022 Broadway
New York, NY 10027
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
314
Abstract Views
918
Rank
149,315
PlumX Metrics