Less is More? Reducing Biases and Overfitting in Machine Learning Return Predictions

58 Pages Posted: 5 Jul 2023

See all articles by Clint Howard

Clint Howard

Robeco Quantitative Investments; University of Technology Sydney

Date Written: July 5, 2023


Machine learning has become increasingly popular in asset pricing research. However, common modeling choices can lead to biases and overfitting. I show that group-specific machine learning models outperform models trained on a broader cross-section of stocks, challenging the common belief that more data leads to better machine learning models. The superior performance of group-specific models can be attributed to a lack of regularization of the target stock returns. Training on raw stock returns produces models that overfit to predicting the returns of smaller stocks, reducing the performance of value-weighted trading strategies. Simple adjustments to the target, such as removing the cross-sectional size–group median, produce similar economic gains as the group–specific models without the added computational cost. These findings emphasize the careful guidance required when designing and applying machine learning models for cross-sectional return prediction.

Keywords: machine learning, asset pricing, overfitting, market capitalization, contextual analysis

JEL Classification: C52, C55, C58, G10, G17

Suggested Citation

Howard, Clint, Less is More? Reducing Biases and Overfitting in Machine Learning Return Predictions (July 5, 2023). Available at SSRN: https://ssrn.com/abstract=4497739 or http://dx.doi.org/10.2139/ssrn.4497739

Clint Howard (Contact Author)

Robeco Quantitative Investments ( email )

Weena 850
Rotterdam, 3014 DA

University of Technology Sydney ( email )

15 Broadway, Ultimo
PO Box 123
Sydney, NSW 2007

Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
PlumX Metrics