Expected Returns and Large Language Models
69 Pages Posted: 21 Apr 2023 Last revised: 13 Sep 2023
Date Written: November 22, 2022
We extract contextualized representations of news text to predict returns using the state-of-the-art large language models in natural language processing. Unlike the traditional word-based methods, e.g., bag-of-words or word vectors, the contextualized representation captures both the syntax and semantics of text, thus providing a more comprehensive understanding of its meaning. Notably, word-based approaches are more susceptible to errors when negation words are present in news articles. Our study includes data from 16 international equity markets and news articles in 13 different languages, providing polyglot evidence of news-induced return predictability. We observe that information in newswires is incorporated into prices with an inefficient delay that aligns with the limits-to-arbitrage, yet can still be exploited in real-time trading strategies. Additionally, we find that a trading strategy that capitalizes on fresh news alerts results in even higher Sharpe ratios.
Keywords: natural language processing (NLP), foundation models, BERT, GPT, OPT, ChatGPT, Bag-of-Words, Word2vec, machine learning, return prediction
JEL Classification: G10, G11, G14, C14, C11, C21, C22, C23, C58
Suggested Citation: Suggested Citation