Less is More? Reducing Biases and Overfitting in Machine Learning Return Predictions
58 Pages Posted: 5 Jul 2023
Date Written: July 5, 2023
Machine learning has become increasingly popular in asset pricing research. However, common modeling choices can lead to biases and overfitting. I show that group-specific machine learning models outperform models trained on a broader cross-section of stocks, challenging the common belief that more data leads to better machine learning models. The superior performance of group-specific models can be attributed to a lack of regularization of the target stock returns. Training on raw stock returns produces models that overfit to predicting the returns of smaller stocks, reducing the performance of value-weighted trading strategies. Simple adjustments to the target, such as removing the cross-sectional size–group median, produce similar economic gains as the group–specific models without the added computational cost. These findings emphasize the careful guidance required when designing and applying machine learning models for cross-sectional return prediction.
Keywords: machine learning, asset pricing, overfitting, market capitalization, contextual analysis
JEL Classification: C52, C55, C58, G10, G17
Suggested Citation: Suggested Citation