Predicting Returns with Text Data

66 Pages Posted: 20 May 2019 Last revised: 17 Aug 2021

See all articles by Zheng Tracy Ke

Zheng Tracy Ke

Harvard University

Bryan T. Kelly

Yale SOM; AQR Capital Management, LLC; National Bureau of Economic Research (NBER)

Dacheng Xiu

University of Chicago - Booth School of Business

Multiple version iconThere are 2 versions of this paper

Date Written: September 30, 2020


We introduce a new text-mining methodology that extracts information from news articles to predict asset returns. Unlike more common sentiment scores used for stock return prediction (e.g., those sold by commercial vendors or built with dictionary-based methods), our supervised learning framework constructs a score that is specifically adapted to the problem of return prediction. Our method proceeds in three steps: 1) isolating a list of terms via predictive screening, 2) assigning prediction weights to these words via topic modeling, and 3) aggregating terms into an article-level predictive score via penalized likelihood. We derive theoretical guarantees on the accuracy of estimates from our model with minimal assumptions. In our empirical analysis, we study one of the most actively monitored streams of news articles in the financial system--the Dow Jones Newswires--and show that our supervised text model excels at extracting return-predictive signals in this context. Information in newswires is assimilated into prices with an ineffcient delay that is broadly consistent with limits-to-arbitrage (i.e., more severe for smaller and more volatile firms) yet can be exploited in a real-time trading strategy with reasonable turnover and net of transaction costs.

Keywords: Text Mining, Machine Learning, Return Predictability, Sentiment Analysis, Screening, Topic Modeling, Penalized Likelihood

Suggested Citation

Ke, Zheng and Kelly, Bryan T. and Xiu, Dacheng, Predicting Returns with Text Data (September 30, 2020). University of Chicago, Becker Friedman Institute for Economics Working Paper No. 2019-69, Yale ICF Working Paper No. 2019-10, Chicago Booth Research Paper No. 20-37, Available at SSRN: or

Zheng Ke

Harvard University ( email )

1875 Cambridge Street
Cambridge, MA 02138
United States

Bryan T. Kelly

Yale SOM ( email )

135 Prospect Street
P.O. Box 208200
New Haven, CT 06520-8200
United States

AQR Capital Management, LLC ( email )

Greenwich, CT
United States

National Bureau of Economic Research (NBER) ( email )

1050 Massachusetts Avenue
Cambridge, MA 02138
United States

Dacheng Xiu (Contact Author)

University of Chicago - Booth School of Business ( email )

5807 S. Woodlawn Avenue
Chicago, IL 60637
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
PlumX Metrics