Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, United States of America.
Department of Clinical Psychology and Psychotherapy, Babes-Bolyai University, Cluj-Napoca Romania.
PLoS One. 2018 May 15;13(5):e0197440. doi: 10.1371/journal.pone.0197440. eCollection 2018.
P values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of P values. We conducted a survey comparing the overall use of P values and the occurrence of significant P values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant P values. Our findings demonstrated substantial and growing reliance on P values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of P values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on P values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported P values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported P values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on P values and implausibly high rates of reported statistical significance are worrisome.
P 值是一种广泛使用的科学推理方法,但却普遍存在误解和激烈争议。包含主要结果的图表和表格等显示项是 P 值的重要来源。我们进行了一项调查,比较了三个顶级多学科期刊(《自然》《科学》《美国国家科学院院刊》)2017 年和 1997 年样本文章中显示项中 P 值的总体使用情况和显著 P 值的出现情况。我们还研究了多重校正的报告及其对统计学上显著 P 值比例的潜在影响。我们的研究结果表明,显示项中对 P 值的依赖程度显著增加,2017 年与 1997 年相比,增加了 2.5 至 14.5 倍。绝大多数 P 值(94%,95%置信区间 [CI]92%至 96%)具有统计学意义。1997 年几乎不存在用于调整多重性的方法,但在 2017 年依赖 P 值的许多文章中都有报道(《自然》68%,《科学》48%,《美国国家科学院院刊》38%)。在没有这些方法的情况下,几乎所有报告的 P 值都具有统计学意义(98%,95%CI96%至 99%)。相反,当描述了任何多重性校正时,88%(95%CI82%至 93%)报告的 P 值具有统计学意义。贝叶斯方法的使用很少(2.5%),很少(0.7%)的文章完全依赖贝叶斯统计学。总的来说,更广泛地认识到需要进行多重性校正,这是一个受欢迎的发展,但对 P 值的依赖迅速增加和报告的统计学意义率高得令人难以置信,这令人担忧。