How Much Can Machines Learn Finance From Chinese Text Data?

57 Pages Posted: 4 Mar 2021

See all articles by Jianqing Fan

Jianqing Fan

Princeton University - Bendheim Center for Finance

Lirong Xue

Princeton University - Department of Operations Research & Financial Engineering (ORFE)

Yang Zhou

Institute for Big Data, Fudan University

Date Written: January 11, 2021

Abstract

Most studies on equity markets using text data focus on English-based specified sentiment dictionaries or topic modeling. However, can we predict the impact of news directly from the text data? How much can we learn from such a direct approach? We present here a new framework for learning text data based on the factor model and sparsity regularization, called FarmPredict, to let machines learn financial returns automatically. Unlike other dictionary-based or topic models that have stringent pre-screening processes, our framework allows the model to extract information more fully from the whole article. We demonstrate our study on the Chinese stock market, as Chinese text has no natural spaces between words and phrases and the Chinese market has a very large proportion of retail investors. These two specific features of our study differ significantly from the previous literature that focuses on English-text and the U.S. market. We validate our method using the literature on the Chinese stock market with several existing approaches. We show that positive sentiments scored by our FarmPredict approach generate on average 83 bps stock daily excess returns, while negative news has an adverse impact of 26 bps on the days of news announcements, where both effects can last for a few days. This asymmetric effect aligns well with the short-sale constraints in the Chinese equity market. As a result, we show that the machine-learned sentiments do provide sizeable predictive power with an annualized return of 116% with a simple investment strategy and the portfolios based on our model significantly outperform other models. This lends further support that our FarmPredict can learn the sentiments embedded in financial news. Our study also demonstrates the far-reaching potential of using machines to learn text data.

Keywords: Machine Learning, Factor Model, Sparse Regression, Textual Analysis, Sentiment Scores, Event Studies, Financial Returns

JEL Classification: C53, C55, C58, G10, G11,G12, G14,

Suggested Citation

Fan, Jianqing and Xue, Lirong and Zhou, Yang, How Much Can Machines Learn Finance From Chinese Text Data? (January 11, 2021). Available at SSRN: https://ssrn.com/abstract=3765862 or http://dx.doi.org/10.2139/ssrn.3765862

Jianqing Fan (Contact Author)

Princeton University - Bendheim Center for Finance ( email )

26 Prospect Avenue
Princeton, NJ 08540
United States
609-258-7924 (Phone)
609-258-8551 (Fax)

HOME PAGE: http://orfe.princeton.edu/~jqfan/

Lirong Xue

Princeton University - Department of Operations Research & Financial Engineering (ORFE) ( email )

Sherrerd Hall, Charlton Street
Princeton, NJ 08544
United States

Yang Zhou

Institute for Big Data, Fudan University ( email )

No. 220 Handan Road
Shanghai, 200433
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
1,447
Abstract Views
3,844
Rank
21,521
PlumX Metrics