Pernicious P-Values: Statistical Proof of Not Very Much
42 University of Dayton Law Review 113 (2017)
53 Pages Posted: 17 Apr 2018
“Null hypothesis significance testing” (“NHST”) is central to many discrimination cases. It compares the employer’s workforce to the workforce that would be statistically “expected” if the employer were selecting randomly with respect to race, sex, etc. The difference allows computation of a “p-value,” which is the probability that a randomly selected employer would have so great a disparity if the “null hypothesis” (that the employer is selecting randomly) is true.
Unfortunately, courts routinely misinterpret the p-value as the probability of random selection–that is, the probability that the null hypothesis is true. This is an instance of the “transposition fallacy”–equating the probability of the data given the hypothesis with the probability of the hypothesis given the data. Because the p-value is calculated assuming that the null hypothesis is true, it cannot provide a probability that it is false. By engaging in the transposition fallacy, courts accord the p-value far greater weight than is warranted, sometimes explicitly linking it with the preponderance-of-evidence standard. In fact, the p-value provides virtually no useful information and should be excluded from trial.
The ubiquitous transposition fallacy is not the only problem with NHST. Indeed, there is increasing dissatisfaction with NHST in the social sciences. Among its other flaws are that it relies on the null hypothesis of “no difference” between the compared groups, which is unlikely a priori; that it leads to conflation of statistical and practical significance, resulting in statistical disparities’ being accorded excessive weight; that it assumes a perfectly specified model (an assumption that is seldom met, even approximately); and that the alternative substantive hypothesis (that the employer is discriminating) does not satisfy the requirement that the null hypothesis and the alternative hypothesis be exhaustive and mutually exclusive, since there are many alternative hypotheses that are consistent with rejection of the null.
Thus, as with p-values themselves, use of NHST in discrimination trials should be eliminated, or, at a minimum, limited to circumstances in which the identified problems are avoided.
Keywords: Title VII, statistical proof, p-values, hypothesis testing, discrimination, evidence, Rule 403
JEL Classification: K31, C12
Suggested Citation: Suggested Citation