Entity Neutering

41 Pages Posted: 10 Apr 2025 Last revised: 9 Apr 2025

See all articles by Joseph Engelberg

Joseph Engelberg

University of California, San Diego (UCSD) - Rady School of Management

Asaf Manela

Washington University in St. Louis - John M. Olin Business School; Reichman University

William Mullins

University of California, San Diego (UCSD)

Luka Vulicevic

University of California, San Diego (UCSD), Rady School of Management, Students

Date Written: March 17, 2025

Abstract

Cutting-edge LLMs are trained on recent data, creating a concern about look-ahead bias. We propose a simple solution called entity neutering: using the LLM to find and remove all identifying information from text. In a sample of one million financial news articles, we verify that, after neutering, ChatGPT and other LLMs cannot recognize the firm or the time period for about 90% of the articles. Among these articles, the sentiment extracted from the raw text and the neutered text agree 90% of the time and have similar return predictability, with the difference providing an upper bound on look-ahead bias. The evidence here suggests that LLMs are able to effectively neuter text while maintaining semantic content. For look-ahead bias, LLMs can be both the problem and the solution.

Keywords: Large Language Models, Look-ahead Bias, Textual Analysis JEL Classification Numbers:

Suggested Citation

Engelberg, Joseph and Manela, Asaf and Mullins, William and Vulicevic, Luka, Entity Neutering (March 17, 2025). Available at SSRN: https://ssrn.com/abstract=5182756 or http://dx.doi.org/10.2139/ssrn.5182756

Joseph Engelberg (Contact Author)

University of California, San Diego (UCSD) - Rady School of Management ( email )

9500 Gilman Drive
Rady School of Management
La Jolla, CA 92093
United States

Asaf Manela

Washington University in St. Louis - John M. Olin Business School ( email )

One Brookings Drive
Campus Box 1133
St. Louis, MO 63130-4899
United States
314-935-9178 (Phone)

HOME PAGE: http://asafmanela.github.io/

Reichman University ( email )

Israel

William Mullins

University of California, San Diego (UCSD) ( email )

9500 Gilman Drive
Mail Code 0502
La Jolla, CA 92093-0112
United States

Luka Vulicevic

University of California, San Diego (UCSD), Rady School of Management, Students ( email )

9500 Gilman Drive
La Jolla, CA 92093
United States
6197196004 (Phone)

HOME PAGE: http://www.lukavulicevic.com

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
328
Abstract Views
1,691
Rank
199,617
PlumX Metrics