• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于灵活调整P值显著性阈值对随机临床试验可重复性的影响。

On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.

作者信息

Habibzadeh Farrokh

机构信息

Independent Research Consultant, Shiraz, Iran.

出版信息

PLoS One. 2025 Jun 13;20(6):e0325920. doi: 10.1371/journal.pone.0325920. eCollection 2025.

DOI:10.1371/journal.pone.0325920
PMID:40512828
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12165351/
Abstract

BACKGROUND

Reproducibility crisis is among major concerns of many scientists worldwide. Some researchers believe that the crisis is mostly attributed to the conventional p significance threshold value arbitrarily chosen to be 0.05 and propose to lower the cut-off to 0.005. Reducing the cut-off, although decreases the false-positive rate, is associated with an increase in false-negative rate. Recently, a flexible p significance threshold that minimizes the weighted sum of errors in statistical inference tests of hypothesis was proposed.

METHODS

The current in silico study was conducted to compare the error rates under different conditions assumed for the p significance threshold-0.05, 0.005, and a flexible threshold. Using a Monte Carlo simulation, the false-positive rate (when the null hypothesis was true) and false-negative rate (when the alternative hypothesis was true) were calculated in a hypothetical randomized clinical trial.

RESULTS

Increasing the study sample size was associated with a reduction in the false-negative rate, however, the false-positive rate occurred at a fixed value regardless of the sample size when fixed significance thresholds were used; the rate decreased, however, when the flexible threshold was employed. While employing the flexible threshold abolished the reproducibility crisis to a large extent, the method uncovered an inherent conflict in the frequentist statistical inference framework. Calculation of the flexible p significance threshold is only possible a posteriori, after the results are obtained. The threshold would thus be different even for replicas, which is in contradiction to the common sense.

CONCLUSIONS

It seems that relying on frequentist statistical inference and the p value is no longer a viable approach. Emphasis should be shifted toward alternative approaches for data analysis, Bayesian statistical methods, for example.

摘要

背景

可重复性危机是全球众多科学家主要关注的问题之一。一些研究人员认为,这场危机主要归因于传统上任意选定为0.05的p值显著性阈值,并提议将临界值降至0.005。降低临界值虽然会降低假阳性率,但会导致假阴性率上升。最近,有人提出了一种灵活的p值显著性阈值,该阈值可使假设统计推断检验中的误差加权和最小化。

方法

进行了当前的计算机模拟研究,以比较在假设的p值显著性阈值(0.05、0.005和灵活阈值)下不同条件下的错误率。在一项假设的随机临床试验中,使用蒙特卡罗模拟计算假阳性率(当原假设为真时)和假阴性率(当备择假设为真时)。

结果

增加研究样本量与假阴性率的降低相关,然而,当使用固定的显著性阈值时,无论样本量大小,假阳性率都固定在一个值;而使用灵活阈值时,该比率会降低。虽然采用灵活阈值在很大程度上消除了可重复性危机,但该方法揭示了频率主义统计推断框架中存在的内在冲突。灵活的p值显著性阈值只能在结果获得后进行事后计算。因此,即使是复制品,该阈值也会有所不同,这与常识相矛盾。

结论

似乎依赖频率主义统计推断和p值已不再是一种可行的方法。应将重点转向数据分析的替代方法,例如贝叶斯统计方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/bd40cabe9206/pone.0325920.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/3559162df5e2/pone.0325920.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/2b9c256423ca/pone.0325920.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/31264fd59e1d/pone.0325920.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/abd1e29a99bf/pone.0325920.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/bd40cabe9206/pone.0325920.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/3559162df5e2/pone.0325920.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/2b9c256423ca/pone.0325920.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/31264fd59e1d/pone.0325920.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/abd1e29a99bf/pone.0325920.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6b/12165351/bd40cabe9206/pone.0325920.g005.jpg

相似文献

1
On the effect of flexible adjustment of the p value significance threshold on the reproducibility of randomized clinical trials.关于灵活调整P值显著性阈值对随机临床试验可重复性的影响。
PLoS One. 2025 Jun 13;20(6):e0325920. doi: 10.1371/journal.pone.0325920. eCollection 2025.
2
Reinterpretation of the results of randomized clinical trials.随机临床试验结果的再解读。
PLoS One. 2024 Jun 14;19(6):e0305575. doi: 10.1371/journal.pone.0305575. eCollection 2024.
3
Patient-Centered Clinical Trial Design for Heart Failure Devices via Bayesian Decision Analysis.基于贝叶斯决策分析的心力衰竭器械以患者为中心的临床试验设计。
Patient. 2023 Jul;16(4):359-369. doi: 10.1007/s40271-023-00623-0. Epub 2023 Apr 19.
4
On the use of receiver operating characteristic curve analysis to determine the most appropriate p value significance threshold.关于使用接收者操作特征曲线分析来确定最合适的 p 值显著性阈值。
J Transl Med. 2024 Jan 4;22(1):16. doi: 10.1186/s12967-023-04827-8.
5
Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives.随机对照试验中的亚组分析:量化假阳性和假阴性风险
Health Technol Assess. 2001;5(33):1-56. doi: 10.3310/hta5330.
6
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
7
Bayesian evaluation of informative hypotheses in cluster-randomized trials.贝叶斯评价在整群随机试验中信息性假设。
Behav Res Methods. 2019 Feb;51(1):126-137. doi: 10.3758/s13428-018-1149-x.
8
Uncertainties of calculated Cramér-Rao lower bounds: implications for quantitative MRS.计算的克拉美-罗下限的不确定性:对定量 MRS 的影响。
Magn Reson Med. 2019 Feb;81(2):759-764. doi: 10.1002/mrm.27415. Epub 2018 Sep 11.
9
The false evidence rate: An approach to frequentist error rate control conditioning on the observed value.错误证据率:一种基于观测值进行频率主义错误率控制的方法。
Proc Natl Acad Sci U S A. 2025 Jan 14;122(2):e2415706122. doi: 10.1073/pnas.2415706122. Epub 2025 Jan 10.
10
Statistical Significance统计学显著性

本文引用的文献

1
Reinterpretation of the results of randomized clinical trials.随机临床试验结果的再解读。
PLoS One. 2024 Jun 14;19(6):e0305575. doi: 10.1371/journal.pone.0305575. eCollection 2024.
2
Credibility of the Value.价值的可信度。
J Korean Med Sci. 2024 Jun 3;39(21):e177. doi: 10.3346/jkms.2024.39.e177.
3
On the use of receiver operating characteristic curve analysis to determine the most appropriate p value significance threshold.关于使用接收者操作特征曲线分析来确定最合适的 p 值显著性阈值。
J Transl Med. 2024 Jan 4;22(1):16. doi: 10.1186/s12967-023-04827-8.
4
Heterogeneity estimates in a biased world.存在偏倚世界中的异质性估计。
PLoS One. 2022 Feb 3;17(2):e0262809. doi: 10.1371/journal.pone.0262809. eCollection 2022.
5
The likelihood ratio and its graphical representation.似然比及其图形表示。
Biochem Med (Zagreb). 2019 Jun 15;29(2):020101. doi: 10.11613/BM.2019.020101. Epub 2019 Apr 15.
6
Redefine statistical significance.重新定义统计学显著性。
Nat Hum Behav. 2018 Jan;2(1):6-10. doi: 10.1038/s41562-017-0189-z.
7
The quest for an optimal alpha.追求最优阿尔法。
PLoS One. 2019 Jan 2;14(1):e0208631. doi: 10.1371/journal.pone.0208631. eCollection 2019.
8
The Proposal to Lower P Value Thresholds to .005.将P值阈值降至0.005的提议。
JAMA. 2018 Apr 10;319(14):1429-1430. doi: 10.1001/jama.2018.1536.
9
Misconceptions, Misuses, and Misinterpretations of P Values and Significance Testing.对 P 值和显著性检验的误解、误用和曲解。
J Bone Joint Surg Am. 2017 Sep 20;99(18):1598-1603. doi: 10.2106/JBJS.16.01314.
10
On determining the most appropriate test cut-off value: the case of tests with continuous results.关于确定最合适的检测临界值:连续结果检测的情况
Biochem Med (Zagreb). 2016 Oct 15;26(3):297-307. doi: 10.11613/BM.2016.034.