Entity Neutering
41 Pages Posted: 10 Apr 2025 Last revised: 9 Apr 2025
Date Written: March 17, 2025
Abstract
Cutting-edge LLMs are trained on recent data, creating a concern about look-ahead bias. We propose a simple solution called entity neutering: using the LLM to find and remove all identifying information from text. In a sample of one million financial news articles, we verify that, after neutering, ChatGPT and other LLMs cannot recognize the firm or the time period for about 90% of the articles. Among these articles, the sentiment extracted from the raw text and the neutered text agree 90% of the time and have similar return predictability, with the difference providing an upper bound on look-ahead bias. The evidence here suggests that LLMs are able to effectively neuter text while maintaining semantic content. For look-ahead bias, LLMs can be both the problem and the solution.
Keywords: Large Language Models, Look-ahead Bias, Textual Analysis JEL Classification Numbers:
Suggested Citation: Suggested Citation