Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis

Posted: 19 Oct 2023 Last revised: 1 Feb 2024

See all articles by Paul Glasserman

Paul Glasserman

Columbia Business School

Caden Lin

Columbia University - Department of Mathematics

Date Written: September 28, 2023

Abstract

Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many years of data, and backtesting produces biased results if the training and backtesting periods overlap. This bias can take two forms: a look-ahead bias, in which the LLM may have specific knowledge of the stock returns that followed a news article, and a distraction effect, in which general knowledge of the companies named interferes with the measurement of a text's sentiment. We investigate these sources of bias through trading strategies driven by the sentiment of financial news headlines. We compare trading performance based on the original headlines with de-biased strategies in which we remove the relevant company's identifiers from the text. In-sample (within the LLM training window), we find, surprisingly, that the anonymized headlines outperform, indicating that the distraction effect has a greater impact than look-ahead bias. This tendency is particularly strong for larger companies --- companies about which we expect an LLM to have greater general knowledge. Out-of-sample, look-ahead bias is not a concern but distraction remains possible. Our proposed anonymization procedure is therefore potentially useful in out-of-sample implementation, as well as for de-biased backtesting.

Keywords: Textual analysis, sentiment, artificial intelligence

JEL Classification: G11, G12, G14

Suggested Citation

Glasserman, Paul and Lin, Caden, Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis (September 28, 2023). Available at SSRN: https://ssrn.com/abstract=4586726 or http://dx.doi.org/10.2139/ssrn.4586726

Paul Glasserman (Contact Author)

Columbia Business School ( email )

New York, NY
United States

Caden Lin

Columbia University - Department of Mathematics ( email )

United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
1,681
PlumX Metrics