Global Virus Network, Middle East Region, Shiraz, Iran.
PLoS One. 2024 Jun 14;19(6):e0305575. doi: 10.1371/journal.pone.0305575. eCollection 2024.
Randomized clinical trials (RCTs) shape our clinical practice. Several studies report a mediocre replicability rate of the studied RCTs. Many researchers believe that the relatively low replication rate of RCTs is attributed to the high p value significance threshold. To solve this problem, some researchers proposed using a lower threshold, which is inevitably associated with a decrease in the study power.
The results of 22 500 RCTs retrieved from the Cochrane Database of Systematic Reviews (CDSR) were reinterpreted using 2 fixed p significance threshold (0.05 and 0.005), and a recently proposed flexible threshold that minimizes the weighted sum of errors in statistical inference.
With p < 0.05 criterion, 28.5% of RCTs were significant; p < 0.005, 14.2%; and p < flexible threshold, 9.9% (2/3 of significant RCTs based on p < 0.05 criterion, were found not significant). Lowering the p cut-off, although decreases the false-positive rate, is not generally associated with a lower weighted sum of errors; the false-negative rate increases (the study power decreases); important treatments may be left undiscovered. Accurate calculation of the optimal p value thresholds needs knowledge of the variance in each study arm, a posteriori.
Lowering the p value threshold, as it is proposed by some researchers, is not reasonable as it might be associated with an increase in false-negative rate. Using a flexible p significance threshold approach, although results in a minimum error in statistical inference, might not be good enough too because only a rough estimation may be calculated a priori; the data necessary for the precise computation of the most appropriate p significance threshold are only available a posteriori. Frequentist statistical framework has an inherent conflict. Alternative methods, say Bayesian methods, although not perfect, would be more appropriate for the data analysis of RCTs.
随机临床试验(RCT)塑造了我们的临床实践。有几项研究报告称,所研究的 RCT 的可重复性中等。许多研究人员认为,RCT 的相对较低的复制率归因于高 p 值显著性阈值。为了解决这个问题,一些研究人员提出使用较低的阈值,这不可避免地会降低研究的效能。
对从 Cochrane 系统评价数据库(CDSR)中检索到的 22500 项 RCT 的结果使用 2 个固定的 p 值显著性阈值(0.05 和 0.005)和最近提出的可灵活调整的最小化统计推断中误差加权和的阈值进行重新解释。
使用 p < 0.05 标准,28.5%的 RCT 是显著的;p < 0.005,14.2%;p < 灵活阈值,9.9%(基于 p < 0.05 标准的 2/3 个显著 RCT 被发现不显著)。降低 p 值截止值,虽然降低了假阳性率,但通常不会与较低的误差加权和相关;假阴性率增加(研究效能降低);重要的治疗方法可能未被发现。准确计算最佳 p 值阈值需要事先了解每个研究组的方差。
如一些研究人员所建议的那样,降低 p 值阈值是不合理的,因为它可能与假阴性率的增加有关。使用灵活的 p 值显著性阈值方法,尽管在统计推断中产生最小的误差,但也可能不够好,因为只能进行粗略的先验估计;精确计算最合适的 p 值显著性阈值所需的数据仅在后验获得。经典统计框架存在内在冲突。替代方法,如贝叶斯方法,虽然不完美,但更适合 RCT 的数据分析。