The Memorization Problem: Can We Trust LLMs' Economic Forecasts?

Large language models (LLMs) cannot be trusted for economic forecasts during periods covered by their training data. Under black-box access, counterfactual forecasting ability is non-identified when the model has seen the realized values: any observed output is consistent with both genuine skill and memorization. Any evidence of memorization represents only a lower bound on encoded knowledge. We demonstrate LLMs have memorized economic and financial data, recalling exact values before their knowledge cutoff. Instructions to respect historical boundaries fail to prevent recall-level accuracy, and masking fails as LLMs reconstruct entities and dates from minimal context. Post-cutoff, we observe no recall. Memorization extends to embeddings.

Keywords: Large language models, Generative AI, Forecasting, ChatGPT, Memorization, Lookahead Bias, Textual Analysis, Embeddings

JEL Classification: C53, C58, E37, G10, G17

Suggested Citation: Suggested Citation

Lopez-Lira, Alejandro and Tang, Yuehua and Zhu, Mingyin, The Memorization Problem: Can We Trust LLMs' Economic Forecasts? (April 15, 2025). Available at SSRN: https://ssrn.com/abstract=5217505 or http://dx.doi.org/10.2139/ssrn.5217505