Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47906, USA.
Psychon Bull Rev. 2012 Dec;19(6):975-91. doi: 10.3758/s13423-012-0322-y.
Replication of empirical findings plays a fundamental role in science. Among experimental psychologists, successful replication enhances belief in a finding, while a failure to replicate is often interpreted to mean that one of the experiments is flawed. This view is wrong. Because experimental psychology uses statistics, empirical findings should appear with predictable probabilities. In a misguided effort to demonstrate successful replication of empirical findings and avoid failures to replicate, experimental psychologists sometimes report too many positive results. Rather than strengthen confidence in an effect, too much successful replication actually indicates publication bias, which invalidates entire sets of experimental findings. Researchers cannot judge the validity of a set of biased experiments because the experiment set may consist entirely of type I errors. This article shows how an investigation of the effect sizes from reported experiments can test for publication bias by looking for too much successful replication. Simulated experiments demonstrate that the publication bias test is able to discriminate biased experiment sets from unbiased experiment sets, but it is conservative about reporting bias. The test is then applied to several studies of prominent phenomena that highlight how publication bias contaminates some findings in experimental psychology. Additional simulated experiments demonstrate that using Bayesian methods of data analysis can reduce (and in some cases, eliminate) the occurrence of publication bias. Such methods should be part of a systematic process to remove publication bias from experimental psychology and reinstate the important role of replication as a final arbiter of scientific findings.
实证发现的复制在科学中起着基础性的作用。在实验心理学家中,成功的复制增强了对发现的信任,而未能复制的结果通常被解释为其中一个实验存在缺陷。这种观点是错误的。由于实验心理学使用统计学,实证发现应该以可预测的概率出现。由于错误地试图证明实证发现的成功复制并避免未能复制,实验心理学家有时会报告过多的积极结果。太多的成功复制实际上表明存在发表偏倚,这会使整个实验结果无效,而不是增强对效应的信心。研究人员无法判断一组有偏实验的有效性,因为实验集可能完全由一类错误组成。本文通过研究报告实验的效应大小,展示了如何通过寻找过多的成功复制来检验发表偏倚。模拟实验表明,发表偏倚检验能够区分有偏实验集和无偏实验集,但它在报告偏倚方面比较保守。然后,该检验被应用于几个突出现象的研究中,这些研究突出了发表偏倚如何污染实验心理学中的一些发现。其他模拟实验表明,使用贝叶斯数据分析方法可以减少(在某些情况下,可以消除)发表偏倚的发生。这些方法应该成为一个系统的过程的一部分,以消除实验心理学中的发表偏倚,并恢复复制作为科学发现的最终裁决者的重要作用。