Anderson Samantha F, Maxwell Scott E
a Department of Psychology , University of Notre Dame.
Multivariate Behav Res. 2017 May-Jun;52(3):305-324. doi: 10.1080/00273171.2017.1289361. Epub 2017 Mar 7.
Psychology is undergoing a replication crisis. The discussion surrounding this crisis has centered on mistrust of previous findings. Researchers planning replication studies often use the original study sample effect size as the basis for sample size planning. However, this strategy ignores uncertainty and publication bias in estimated effect sizes, resulting in overly optimistic calculations. A psychologist who intends to obtain power of .80 in the replication study, and performs calculations accordingly, may have an actual power lower than .80. We performed simulations to reveal the magnitude of the difference between actual and intended power based on common sample size planning strategies and assessed the performance of methods that aim to correct for effect size uncertainty and/or bias. Our results imply that even if original studies reflect actual phenomena and were conducted in the absence of questionable research practices, popular approaches to designing replication studies may result in a low success rate, especially if the original study is underpowered. Methods correcting for bias and/or uncertainty generally had higher actual power, but were not a panacea for an underpowered original study. Thus, it becomes imperative that 1) original studies are adequately powered and 2) replication studies are designed with methods that are more likely to yield the intended level of power.
心理学正经历一场复制危机。围绕这场危机的讨论主要集中在对先前研究结果的不信任上。计划进行复制研究的研究人员通常将原始研究样本的效应量作为样本量规划的基础。然而,这种策略忽略了估计效应量中的不确定性和发表偏倚,导致计算过于乐观。一位打算在复制研究中获得0.80检验效能的心理学家,并据此进行计算,其实际检验效能可能低于0.80。我们进行了模拟,以揭示基于常见样本量规划策略的实际检验效能与预期检验效能之间的差异程度,并评估旨在校正效应量不确定性和/或偏倚的方法的性能。我们的结果表明,即使原始研究反映了实际现象且是在没有可疑研究行为的情况下进行的,设计复制研究的常用方法也可能导致成功率较低,特别是如果原始研究的检验效能不足。校正偏倚和/或不确定性的方法通常具有较高的实际检验效能,但对于检验效能不足的原始研究并非万灵药。因此,当务之急是:1)原始研究要有足够的检验效能;2)复制研究要采用更有可能产生预期检验效能水平的方法来设计。