The Limits of p-Hacking: Some Thought Experiments

43 Pages Posted: 16 Nov 2018 Last revised: 18 Nov 2020

See all articles by Andrew Y. Chen

Andrew Y. Chen

Board of Governors of the Federal Reserve System

Date Written: November 11, 2020

Abstract

Suppose the 300+ published asset pricing factors are all spurious. How much p-hacking is required to produce these factors? If 10,000 researchers generate 8 factors every day, it takes hundreds of years. This is because dozens of published t-statistics exceed 6.0, while the corresponding p-value is infinitesimal, implying an astronomical amount of p-hacking in a general model. More structure implies p-hacking cannot address ≈100 published t-statistics that exceed 4.0, as they require an implausibly non-linear preference for t-statistics or even more p-hacking. These results imply mispricing, risk, and/or frictions have a key role in stock returns.

Keywords: Stock return anomalies, publication bias, data mining, multiple testing, p-hacking

JEL Classification: G10, G12

Suggested Citation

Chen, Andrew Y., The Limits of p-Hacking: Some Thought Experiments (November 11, 2020). Journal of Finance, Forthcoming, Available at SSRN: https://ssrn.com/abstract=3272572 or http://dx.doi.org/10.2139/ssrn.3272572

Andrew Y. Chen (Contact Author)

Board of Governors of the Federal Reserve System ( email )

20th Street and Constitution Avenue NW
Washington, DC 20551
United States
202-973-6941 (Phone)

HOME PAGE: http://sites.google.com/site/chenandrewy/

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
766
Abstract Views
3,515
rank
38,972
PlumX Metrics