Department of Mathematics, Loyola Marymount University, Los Angeles, California, United States of America.
Tempest Technologies, Los Angeles, California, United States of America.
PLoS One. 2024 May 16;19(5):e0303262. doi: 10.1371/journal.pone.0303262. eCollection 2024.
In recent years, concern has grown about the inappropriate application and interpretation of P values, especially the use of P<0.05 to denote "statistical significance" and the practice of P-hacking to produce results below this threshold and selectively reporting these in publications. Such behavior is said to be a major contributor to the large number of false and non-reproducible discoveries found in academic journals. In response, it has been proposed that the threshold for statistical significance be changed from 0.05 to 0.005. The aim of the current study was to use an evolutionary agent-based model comprised of researchers who test hypotheses and strive to increase their publication rates in order to explore the impact of a 0.005 P value threshold on P-hacking and published false positive rates. Three scenarios were examined, one in which researchers tested a single hypothesis, one in which they tested multiple hypotheses using a P<0.05 threshold, and one in which they tested multiple hypotheses using a P<0.005 threshold. Effects sizes were varied across models and output assessed in terms of researcher effort, number of hypotheses tested and number of publications, and the published false positive rate. The results supported the view that a more stringent P value threshold can serve to reduce the rate of published false positive results. Researchers still engaged in P-hacking with the new threshold, but the effort they expended increased substantially and their overall productivity was reduced, resulting in a decline in the published false positive rate. Compared to other proposed interventions to improve the academic publishing system, changing the P value threshold has the advantage of being relatively easy to implement and could be monitored and enforced with minimal effort by journal editors and peer reviewers.
近年来,人们对 P 值的不当应用和解释越来越关注,尤其是使用 P<0.05 来表示“统计学意义”,以及为了得到低于该阈值的结果而进行 P 值操纵,并选择性地在出版物中报告这些结果。这种行为被认为是导致学术期刊中大量虚假和不可重现的发现的主要原因之一。有鉴于此,有人建议将统计学显著性的阈值从 0.05 改为 0.005。本研究的目的是使用一个由研究人员组成的进化代理模型,这些研究人员检验假设并努力提高他们的发表率,以探讨 0.005 的 P 值阈值对 P 值操纵和发表的假阳性率的影响。我们检查了三种情况,一种是研究人员检验一个单一假设,一种是他们使用 P<0.05 阈值检验多个假设,还有一种是他们使用 P<0.005 阈值检验多个假设。在模型中,我们改变了效应大小,然后根据研究人员的努力、检验的假设数量和发表的论文数量,以及发表的假阳性率来评估输出。结果支持了这样一种观点,即更严格的 P 值阈值可以降低发表的假阳性结果的比率。研究人员仍然在进行 P 值操纵,但他们所花费的努力大大增加,整体生产力降低,导致发表的假阳性率下降。与其他旨在改善学术出版系统的干预措施相比,改变 P 值阈值具有相对容易实施的优势,期刊编辑和同行评审员只需付出最小的努力就可以进行监测和执行。