Department of Psychology, University of Amsterdam.
J Exp Psychol Gen. 2014 Aug;143(4):1457-75. doi: 10.1037/a0036731. Epub 2014 May 26.
Replication attempts are essential to the empirical sciences. Successful replication attempts increase researchers' confidence in the presence of an effect, whereas failed replication attempts induce skepticism and doubt. However, it is often unclear to what extent a replication attempt results in success or failure. To quantify replication outcomes we propose a novel Bayesian replication test that compares the adequacy of 2 competing hypotheses. The 1st hypothesis is that of the skeptic and holds that the effect is spurious; this is the null hypothesis that postulates a zero effect size, H₀ : δ = 0. The 2nd hypothesis is that of the proponent and holds that the effect is consistent with the one found in the original study, an effect that can be quantified by a posterior distribution. Hence, the 2nd hypothesis-the replication hypothesis-is given by Hr : δ ∼ "posterior distribution from original study." The weighted-likelihood ratio between H₀ and Hr quantifies the evidence that the data provide for replication success and failure. In addition to the new test, we present several other Bayesian tests that address different but related questions concerning a replication study. These tests pertain to the independent conclusions of the separate experiments, the difference in effect size between the original experiment and the replication attempt, and the overall conclusion based on the pooled results. Together, this suite of Bayesian tests allows a relatively complete formalization of the way in which the result of a replication attempt alters our knowledge of the phenomenon at hand. The use of all Bayesian replication tests is illustrated with 3 examples from the literature. For experiments analyzed using the t test, computation of the new replication test only requires the t values and the numbers of participants from the original study and the replication study.
复制尝试对于经验科学至关重要。成功的复制尝试会增加研究人员对存在效应的信心,而失败的复制尝试则会引起怀疑和质疑。然而,通常不清楚复制尝试在多大程度上取得成功或失败。为了量化复制结果,我们提出了一种新的贝叶斯复制检验方法,该方法比较了两个竞争假设的充分性。第一个假设是怀疑论者的假设,认为效应是虚假的;这是零效应大小的零假设,H₀:δ=0。第二个假设是支持者的假设,认为效应与原始研究中发现的效应一致,该效应可以通过后验分布来量化。因此,第二个假设——复制假设——由 Hr:δ∼“原始研究的后验分布”给出。H₀和 Hr 之间的加权似然比量化了数据为复制成功和失败提供的证据。除了新的检验方法,我们还提出了其他几种贝叶斯检验方法,这些方法解决了与复制研究相关的不同但相关的问题。这些检验方法涉及到独立实验的独立结论、原始实验和复制尝试之间的效应大小差异,以及基于汇总结果的总体结论。这一系列贝叶斯检验方法一起允许相对完整地形式化复制尝试的结果如何改变我们对当前现象的认识。使用三个来自文献的例子说明了所有贝叶斯复制检验的用法。对于使用 t 检验进行分析的实验,新复制检验的计算仅需要原始研究和复制研究的 t 值和参与者数量。