Information Leakage in Backtesting
10 Pages Posted: 4 May 2021 Last revised: 12 May 2021
Date Written: May 12, 2021
Testing the performance of statistical models with historical time series requires a careful handling of the data. Even if a dataset is seemingly completely separated in an in-sample and an out-of-sample set information may be leaked. Such leakage can lead to a significant overestimation of the out-of-sample performance of a predictive model. We provide experimental evidence to illustrate how randomised data splits lead to overfitting in the presence of time series structure. The experiment is set up in the framework of option replication, with real-world and simulated data.
Keywords: Data snooping; Hedging; Information leakage; Overfitting; Pseudo real-time; Time series
JEL Classification: G13, C45
Suggested Citation: Suggested Citation