Pfeiffer Thomas, Rand David G, Dreber Anna
Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts, United States of America.
PLoS One. 2009;4(2):e4607. doi: 10.1371/journal.pone.0004607. Epub 2009 Feb 25.
In a recent controversial essay, published by JPA Ioannidis in PLoS Medicine, it has been argued that in some research fields, most of the published findings are false. Based on theoretical reasoning it can be shown that small effect sizes, error-prone tests, low priors of the tested hypotheses and biases in the evaluation and publication of research findings increase the fraction of false positives. These findings raise concerns about the reliability of research. However, they are based on a very simple scenario of scientific research, where single tests are used to evaluate independent hypotheses.
METHODOLOGY/PRINCIPAL FINDINGS: In this study, we present computer simulations and experimental approaches for analyzing more realistic scenarios. In these scenarios, research tasks are solved sequentially, i.e. subsequent tests can be chosen depending on previous results. We investigate simple sequential testing and scenarios where only a selected subset of results can be published and used for future rounds of test choice. Results from computer simulations indicate that for the tasks analyzed in this study, the fraction of false among the positive findings declines over several rounds of testing if the most informative tests are performed. Our experiments show that human subjects frequently perform the most informative tests, leading to a decline of false positives as expected from the simulations.
CONCLUSIONS/SIGNIFICANCE: For the research tasks studied here, findings tend to become more reliable over time. We also find that the performance in those experimental settings where not all performed tests could be published turned out to be surprisingly inefficient. Our results may help optimize existing procedures used in the practice of scientific research and provide guidance for the development of novel forms of scholarly communication.
在JPA·约阿尼季斯发表于《公共科学图书馆·医学》的一篇近期颇具争议的文章中,有人认为在某些研究领域,大多数已发表的研究结果是错误的。基于理论推理可以表明,效应量小、易出错的检验、所检验假设的先验概率低以及研究结果评估和发表过程中的偏差会增加假阳性的比例。这些发现引发了对研究可靠性的担忧。然而,它们基于一种非常简单的科学研究情景,即使用单一检验来评估独立假设。
方法/主要发现:在本研究中,我们提出了用于分析更现实情景的计算机模拟和实验方法。在这些情景中,研究任务是依次解决的,即后续检验可根据先前结果来选择。我们研究了简单的序贯检验以及仅能发表选定结果子集并将其用于未来检验选择轮次的情景。计算机模拟结果表明,对于本研究中分析的任务,如果进行信息性最强的检验,在多轮检验中阳性结果中的假结果比例会下降。我们的实验表明,人类受试者经常进行信息性最强的检验,导致假阳性如模拟预期那样下降。
结论/意义:对于此处研究的研究任务,随着时间推移,研究结果往往会变得更可靠。我们还发现,在并非所有进行的检验都能发表的实验环境中,其表现出人意料地低效。我们的结果可能有助于优化科学研究实践中使用的现有程序,并为新型学术交流形式的发展提供指导。