Habibzadeh Farrokh
Independent Research Consultant, Shiraz, Iran.
PLoS One. 2025 Jun 13;20(6):e0325920. doi: 10.1371/journal.pone.0325920. eCollection 2025.
Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.
The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.
Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.
It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.
可重复性危机是全球众多科学家主要关注的问题之一。一些研究人员认为,这场危机主要归因于传统上任意选定为0.05的p值显著性阈值,并提议将临界值降至0.005。降低临界值虽然会降低假阳性率,但会导致假阴性率上升。最近,有人提出了一种灵活的p值显著性阈值,该阈值可使假设统计推断检验中的误差加权和最小化。
进行了当前的计算机模拟研究,以比较在假设的p值显著性阈值(0.05、0.005和灵活阈值)下不同条件下的错误率。在一项假设的随机临床试验中,使用蒙特卡罗模拟计算假阳性率(当原假设为真时)和假阴性率(当备择假设为真时)。
增加研究样本量与假阴性率的降低相关,然而,当使用固定的显著性阈值时,无论样本量大小,假阳性率都固定在一个值;而使用灵活阈值时,该比率会降低。虽然采用灵活阈值在很大程度上消除了可重复性危机,但该方法揭示了频率主义统计推断框架中存在的内在冲突。灵活的p值显著性阈值只能在结果获得后进行事后计算。因此,即使是复制品,该阈值也会有所不同,这与常识相矛盾。
似乎依赖频率主义统计推断和p值已不再是一种可行的方法。应将重点转向数据分析的替代方法,例如贝叶斯统计方法。