Interpretable Machine Learning for Earnings Forecasts: Leveraging High-Dimensional Financial Statement Data
70 Pages Posted: 14 Nov 2023 Last revised: 27 Feb 2025
Date Written: October 31, 2023
Abstract
We predict earnings for forecast horizons of up to five years by using the entire set of Compustat financial statement data as input and providing it to state-of-the-art machine learning models capable of approximating arbitrary functional forms. Our approach improves prediction one year ahead by an average of 11% compared to the traditional linear approach that performs best. This superior performance is consistent across a variety of evaluation metrics as well as different firm subsamples and translates into more profitable investment strategies. Extensive model interpretation reveals that income statement variables, especially different definitions of earnings, are by far the most important predictors. Conversely, we find that while income statement variables decline in relevance, balance sheet information becomes more significant as the forecast horizon extends. Lastly, we show that the influence of interactions and non-
linearities on the machine learning forecast is modest, but substantial differences between firm subsamples exist.
Keywords: Cross-Sectional Earnings Models, Machine Learning, Earnings Forecasts
JEL Classification: G11, G12, G17, G31, G32, M40, M41
Suggested Citation: Suggested Citation
Hess, Dieter and Simon, Frederik and Weibels, Sebastian, Interpretable Machine Learning for Earnings Forecasts: Leveraging High-Dimensional Financial Statement Data (October 31, 2023). Available at SSRN: https://ssrn.com/abstract=4619313 or http://dx.doi.org/10.2139/ssrn.4619313
Do you have a job opening that you would like to promote on SSRN?
Feedback
Feedback to SSRN