Suppr超能文献

评估心理实验复制中的异质性和功效。

Assessing heterogeneity and power in replications of psychological experiments.

机构信息

Institute for Policy Research, Northwestern University.

Department of Statistics, Northwestern University.

出版信息

Psychol Bull. 2020 Aug;146(8):701-719. doi: 10.1037/bul0000232. Epub 2020 Apr 9.

Abstract

In this study, we reanalyze recent empirical research on replication from a meta-analytic perspective. We argue that there are different ways to define "replication failure," and that analyses can focus on exploring variation among replication studies or assess whether their results contradict the findings of the original study. We apply this framework to a set of psychological findings that have been replicated and assess the sensitivity of these analyses. We find that tests for replication that involve only a single replication study are almost always severely underpowered. Among the 40 findings for which ensembles of multisite direct replications were conducted, we find that between 11 and 17 (28% to 43%) ensembles produced heterogeneous effects, depending on how replication is defined. This heterogeneity could not be completely explained by moderators documented by replication research programs. We also find that these ensembles were not always well-powered to detect potentially meaningful values of heterogeneity. Finally, we identify several discrepancies between the results of original studies and the distribution of effects found by multisite replications but note that these analyses also have low power. We conclude by arguing that efforts to assess replication would benefit from further methodological work on designing replication studies to ensure analyses are sufficiently sensitive. (PsycInfo Database Record (c) 2020 APA, all rights reserved).

摘要

在这项研究中,我们从元分析的角度重新分析了最近关于复制的实证研究。我们认为,有不同的方法来定义“复制失败”,分析可以侧重于探索复制研究之间的差异,或者评估它们的结果是否与原始研究的发现相矛盾。我们将这一框架应用于一组已经被复制的心理学发现,并评估这些分析的敏感性。我们发现,仅涉及单个复制研究的复制测试几乎总是严重缺乏效力。在进行了多地点直接复制的集合的 40 个发现中,我们发现,根据复制的定义,有 11 到 17 个(28%到 43%)集合产生了异质效应。这种异质性不能完全用复制研究计划记录的调节因素来解释。我们还发现,这些集合并不总是有足够的能力来检测潜在有意义的异质值。最后,我们发现原始研究的结果与多地点复制发现的效应分布之间存在一些差异,但请注意,这些分析的效力也很低。我们的结论是,评估复制的努力将受益于进一步的方法学工作,以确保分析具有足够的敏感性。(PsycInfo 数据库记录(c)2020 APA,保留所有权利)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验