Global Virus Network, Middle East Region of Global Virus Network (GVN), Shiraz, Iran.
J Transl Med. 2024 Jan 4;22(1):16. doi: 10.1186/s12967-023-04827-8.
p value is the most common statistic reported in scientific research articles. Choosing the conventional threshold of 0.05 commonly used for the p value in research articles, is unfounded. Many researchers have tried to provide a reasonable threshold for the p value; some proposed a lower threshold, eg, 0.005. However, none of the proposals has gained universal acceptance. Using the analogy between the diagnostic tests with continuous results and statistical inference tests of hypothesis, I wish to present a method to calculate the most appropriate p value significance threshold using the receiver operating characteristic curve (ROC) analysis.
As with diagnostic tests where the most appropriate cut-off values are different depending on the situation, there is no unique cut-off for the p significance threshold. Unlike the previous proposals, which mostly suggest lowering the threshold to a fixed value (eg, from 0.05 to 0.005), the most appropriate p significance threshold proposed here, in most instances, is much less than the conventional cut-off of 0.05 and varies from study to study and from statistical test to test, even within a single study. The proposed method provides the minimum weighted sum of type I and type II errors.
Given the perplexity involved in using the frequentist statistics in a correct way (dealing with different p significance thresholds, even in a single study), it seems that the p value is no longer a proper statistic to be used in our research; it should be replaced by alternative methods, eg, Bayesian methods.
p 值是科学研究文章中最常报告的统计数据。选择传统的 0.05 作为研究文章中 p 值的常用阈值是没有依据的。许多研究人员试图为 p 值提供一个合理的阈值;一些人提出了较低的阈值,例如 0.005。然而,这些提议都没有得到普遍认可。通过将具有连续结果的诊断测试与假设统计推断测试进行类比,我希望提出一种使用受试者工作特征曲线(ROC)分析计算最合适的 p 值显著性阈值的方法。
与诊断测试一样,由于最适合的截断值因情况而异,因此 p 值显著性阈值没有唯一的截断值。与之前的提议大多建议将阈值降低到固定值(例如,从 0.05 降低到 0.005)不同,这里提出的最合适的 p 值显著性阈值在大多数情况下远低于传统的 0.05 截断值,并且因研究和统计测试而异,甚至在单个研究中也是如此。所提出的方法提供了最小的 I 型和 II 型错误加权和。
鉴于在正确使用频率统计数据时存在的困惑(处理不同的 p 值显著性阈值,甚至在单个研究中),似乎 p 值不再是我们研究中合适的统计数据;它应该被替代方法(例如贝叶斯方法)所取代。