Johnson Valen E, Payne Richard D, Wang Tianying, Asher Alex, Mandal Soutrik
Department of Statistics, Texas A&M University, College Station, TX.
J Am Stat Assoc. 2017;112(517):1-10. doi: 10.1080/01621459.2016.1240079. Epub 2016 Oct 7.
Investigators from a large consortium of scientists recently performed a multi-year study in which they replicated 100 psychology experiments. Although statistically significant results were reported in 97% of the original studies, statistical significance was achieved in only 36% of the replicated studies. This article presents a reanalysis of these data based on a formal statistical model that accounts for publication bias by treating outcomes from unpublished studies as missing data, while simultaneously estimating the distribution of effect sizes for those studies that tested nonnull effects. The resulting model suggests that more than 90% of tests performed in eligible psychology experiments tested negligible effects, and that publication biases based on -values caused the observed rates of nonreproducibility. The results of this reanalysis provide a compelling argument for both increasing the threshold required for declaring scientific discoveries and for adopting statistical summaries of evidence that account for the high proportion of tested hypotheses that are false. Supplementary materials for this article are available online.
来自一个大型科学家联盟的研究人员最近进行了一项为期多年的研究,他们重复了100个心理学实验。尽管在97%的原始研究中报告了具有统计学意义的结果,但在重复研究中只有36%取得了统计学意义。本文基于一个正式的统计模型对这些数据进行了重新分析,该模型通过将未发表研究的结果视为缺失数据来考虑发表偏倚,同时估计那些测试非零效应的研究的效应大小分布。由此产生的模型表明,在符合条件的心理学实验中进行的测试中,超过90%测试的是可忽略不计的效应,并且基于P值的发表偏倚导致了观察到的不可重复性率。这一重新分析的结果为提高宣布科学发现所需的阈值以及采用考虑到大量测试假设为假的证据的统计总结提供了有力论据。本文的补充材料可在线获取。