Caution Ahead: Numerical Reasoning and Look-ahead Bias in AI Models

113 Pages Posted: 13 Jan 2025 Last revised: 25 Jan 2025

See all articles by Bradford Levy

Bradford Levy

University of Chicago - Booth School of Business

Date Written: December 25, 2024

Abstract

Recent work within accounting and finance has highlighted that modern AI systems exhibit superhuman performance on a variety of foundational activities within these fields. However, the literature often does not provide economic rationale for why AI models seem to outperform, largely because these models are a black box. Through a series of experiments, I set out to open the black box and provide direct evidence on how and why AI models appear to perform so well on accounting and finance-related tasks. I show that much of the superior performance of AI models can be attributed to artifacts of the modeling itself, rather than to mechanisms grounded in economics. Focusing on two key components of AI models which may bias inferences in papers which rely on them, I first show that LLMs exhibit extremely poor numerical reasoning and thus application in these settings should proceed with caution. Second, I highlight that commercial LLMs suffer from significant look-ahead bias, which may explain a large portion of their predictability in various settings. In the final part of the paper, I highlight numerous opportunities where AI systems can advance our research.

Keywords: AI, large language models, numerical reasoning, memorization, look-ahead bias, representation learning, multi-modal models

Suggested Citation

Levy, Bradford, Caution Ahead: Numerical Reasoning and Look-ahead Bias in AI Models (December 25, 2024). Fama-Miller Working Paper, Available at SSRN: https://ssrn.com/abstract=5082861 or http://dx.doi.org/10.2139/ssrn.5082861

Bradford Levy (Contact Author)

University of Chicago - Booth School of Business ( email )

5807 S Woodlawn Ave
Chicago, IL 60637
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
969
Abstract Views
2,298
Rank
49,800
PlumX Metrics