Small Telescopes: Detectability and the Evaluation of Replication Results
University of Pennsylvania - The Wharton School
December 10, 2013
How should we evaluate replication results? By far the most common approach is to consider them successful if p<.05, failures otherwise. By ignoring effect size this approach is fundamentally flawed, easily leading to inferences opposite of what the evidence merits. Its problems are demonstrated revisiting replications of three famous findings: the embodiment of morality, the endowment effect, and weather & happiness. A new approach is proposed. It combines effect-size estimation with hypothesis testing into a single test. It evaluates if replications rule-out an effect big enough to be detectable with the original study. Benefits include: (i) distinguishing p>.05 replications that are simply too-noisy from those that indicate the effect is zero/negligible, (ii) “protecting” true findings from underpowered replications, (iii) incorporating effect size within a hypothesis-testing framework, and vice versa, and (iv) arriving at intuitively compelling inferences in general and for the revisited replications in particular.
Number of Pages in PDF File: 26
Keywords: Replications, power, p-values, effect sizeworking papers series
Date posted: May 3, 2013 ; Last revised: December 11, 2013
© 2014 Social Science Electronic Publishing, Inc. All Rights Reserved.
This page was processed by apollo4 in 0.578 seconds