Backtest Overfitting in the Machine Learning Era: A Comparison of Out-of-Sample Testing Methods in a Synthetic Controlled Environment

26 Pages Posted: 17 Jan 2024

See all articles by Hamid R. Arian

Hamid R. Arian

York University

Daniel Norouzi M.

University of Toronto - RiskLab; Sharif University of Technology

Luis A. Seco

University of Toronto; University of Toronto - RiskLab

Multiple version iconThere are 2 versions of this paper

Date Written: January 6, 2024


This research explores the integration of advanced statistical models and machine learning in financial analytics, representing a shift from traditional to advanced, data-driven methods. We address a critical gap in quantitative finance: the need for robust model evaluation and out-of-sample testing methodologies, particularly tailored cross-validation techniques for financial markets. We present a comprehensive framework to assess these methods, considering the unique characteristics of financial data like non-stationarity, autocorrelation, and regime shifts. Through our analysis, we unveil the marked superiority of the Combinatorial Purged (CPCV) method in mitigating overfitting risks, outperforming traditional methods like K-Fold, Purged K-Fold, and especially Walk-Forward, as evidenced by its lower Probability of Backtest Overfitting (PBO) and superior Deflated Sharpe Ratio (DSR) Test Statistic. Walk-Forward, by contrast, exhibits notable shortcomings in false discovery prevention, characterized by increased temporal variability and weaker stationarity. This contrasts starkly with CPCV's demonstrable stability and efficiency, confirming its reliability for financial strategy development. The analysis also suggests that choosing between Purged K-Fold and K-Fold necessitates caution due to their comparable performance and potential impact on the robustness of training data in out-of-sample testing. Our investigation utilizes a Synthetic Controlled Environment incorporating advanced models like the Heston Stochastic Volatility, Merton Jump Diffusion, and Drift-Burst Hypothesis, alongside regime-switching models. This approach provides a nuanced simulation of market conditions, offering new insights into evaluating cross-validation techniques. Our study underscores the necessity of specialized validation methods in financial modeling, especially in the face of growing regulatory demands and complex market dynamics. It bridges theoretical and practical finance, offering a fresh outlook on financial model validation. Highlighting the significance of advanced cross-validation techniques like CPCV, our research enhances the reliability and applicability of financial models in decision-making.

Keywords: Quantitative Finance, Financial Machine Learning, Cross-Validation, Probability of Backtest Overfitting

JEL Classification: C58, G17, C52, C53, G12

Suggested Citation

Arian, Hamid R. and Norouzi Mobarekeh, Daniel and Seco, Luis A., Backtest Overfitting in the Machine Learning Era: A Comparison of Out-of-Sample Testing Methods in a Synthetic Controlled Environment (January 6, 2024). Available at SSRN: or

Hamid R. Arian (Contact Author)

York University ( email )

4700 Keele Street
Toronto, Ontario M3J 1P3


Daniel Norouzi Mobarekeh

University of Toronto - RiskLab ( email )

1 Spadina Crescent
Toronto, ON M5S 3G3

Sharif University of Technology ( email )


Luis A. Seco

University of Toronto ( email )

Department of Mathematics
Toronto, Ontario M5S 3E6

University of Toronto - RiskLab ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics