51 Pages Posted: 14 Aug 2017 Last revised: 16 Aug 2017
Date Written: August 12, 2017
We implement a data mining approach to generate about 2.1 million trading strategies. This large set of strategies serves as a laboratory to evaluate the seriousness of p-hacking and data snooping in finance. We apply multiple hypothesis testing techniques that account for cross-correlations in signals and returns to produce t-statistic thresholds that control the proportion of false discoveries. We find that the difference in rejections rates produced by single and multiple hypothesis testing is such that most rejections of the null of no outperformance under single hypothesis testing are likely false (i.e., we find a very high rate of type I errors). Combining statistical criteria with economic considerations, we find that a remarkably small number of strategies survive our thorough vetting procedure. Even these surviving strategies have no theoretical underpinnings. Overall, p-hacking is a serious problem and, correcting for it, outperforming trading strategies are rare.
Keywords: Hypothesis testing, False discoveries, Trading strategies
JEL Classification: G10, G11, G12
Suggested Citation: Suggested Citation
Chordia, Tarun and Goyal, Amit and Saretto, Alessio, p-Hacking: Evidence from Two Million Trading Strategies (August 12, 2017). Available at SSRN: https://ssrn.com/abstract=3017677