Department of Epidemiology, Harvard T. H. Chan School of Public Health.
Psychol Methods. 2019 Oct;24(5):571-575. doi: 10.1037/met0000223.
Psychological scientists are now trying to replicate published research from scratch to confirm the findings. In an increasingly widespread replication study design, each of several collaborating sites (such as universities) independently tries to replicate an original study, and the results are synthesized across sites. Hedges and Schauer (2019) proposed statistical analyses for these replication projects; their analyses focus on assessing the extent to which results differ across the replication sites, by testing for heterogeneity among a set of replication studies, while excluding the original study. We agree with their premises regarding the limitations of existing analysis methods and regarding the importance of accounting for heterogeneity among the replications. This objective may be interesting in its own right. However, we argue that by focusing only on whether the replication studies have similar effect sizes to one another, these analyses are not particularly appropriate for assessing whether the replications in fact support the scientific effect under investigation or for assessing the power of multisite replication projects. We reanalyze Hedges and Schauer's (2019) example dataset using alternative metrics of replication success that directly address these objectives. We reach a more optimistic conclusion regarding replication success than they did, illustrating that the alternative metrics can lead to quite different conclusions from those of Hedges and Schauer (2019). (PsycINFO Database Record (c) 2019 APA, all rights reserved).
心理学科学家现在正试图从零开始复制已发表的研究,以确认这些发现。在一种越来越广泛使用的复制研究设计中,几个合作的研究点(如大学)会独立地尝试复制原始研究,然后对这些研究点的结果进行综合分析。Hedges 和 Schauer(2019)为这些复制项目提出了统计分析方法;他们的分析侧重于通过检验一组复制研究中的异质性,同时排除原始研究,来评估结果在复制研究点之间的差异程度。我们同意他们关于现有分析方法的局限性的前提,以及在复制中考虑异质性的重要性的前提。这一目标本身可能就很有趣。然而,我们认为,仅仅关注复制研究彼此之间的效应大小是否相似,这些分析并不特别适合评估复制研究实际上是否支持正在研究的科学效应,也不适合评估多地点复制项目的功效。我们使用直接针对这些目标的替代复制成功度量标准,重新分析了 Hedges 和 Schauer(2019)的示例数据集。与他们相比,我们对复制成功的结论更为乐观,这表明替代度量标准可能会导致与 Hedges 和 Schauer(2019)不同的结论。(PsycINFO 数据库记录(c)2019 APA,保留所有权利)。