The Limits of p-Hacking: Some Thought Experiments
43 Pages Posted: 16 Nov 2018 Last revised: 18 Nov 2020
Date Written: November 11, 2020
Suppose the 300+ published asset pricing factors are all spurious. How much p-hacking is required to produce these factors? If 10,000 researchers generate 8 factors every day, it takes hundreds of years. This is because dozens of published t-statistics exceed 6.0, while the corresponding p-value is infinitesimal, implying an astronomical amount of p-hacking in a general model. More structure implies p-hacking cannot address ≈100 published t-statistics that exceed 4.0, as they require an implausibly non-linear preference for t-statistics or even more p-hacking. These results imply mispricing, risk, and/or frictions have a key role in stock returns.
Keywords: Stock return anomalies, publication bias, data mining, multiple testing, p-hacking
JEL Classification: G10, G12
Suggested Citation: Suggested Citation