Suppr超能文献

贝叶斯推理视角下的频率派假设检验解读

Interpreting frequentist hypothesis tests: insights from Bayesian inference.

机构信息

Department of Anaesthesia and the Cardiothoracic and Vascular Intensive Care Unit, Auckland City Hospital, Auckland, New Zealand.

Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand.

出版信息

Can J Anaesth. 2023 Oct;70(10):1560-1575. doi: 10.1007/s12630-023-02557-5. Epub 2023 Oct 4.

Abstract

Randomized controlled trials are one of the best ways of quantifying the effectiveness of medical interventions. Therefore, when the authors of a randomized superiority trial report that differences in the primary outcome between the intervention group and the control group are "significant" (i.e., P ≤ 0.05), we might assume that the intervention has an effect on the outcome. Similarly, when differences between the groups are "not significant," we might assume that the intervention does not have an effect on the outcome. Nevertheless, both assumptions are frequently incorrect.In this article, we explore the relationship that exists between real treatment effects and declarations of statistical significance based on P values and confidence intervals. We explain why, in some circumstances, the chance an intervention is ineffective when P ≤ 0.05 exceeds 25% and the chance an intervention is effective when P > 0.05 exceeds 50%.Over the last decade, there has been increasing interest in Bayesian methods as an alternative to frequentist hypothesis testing. We provide a robust but nontechnical introduction to Bayesian inference and explain why a Bayesian posterior distribution overcomes many of the problems associated with frequentist hypothesis testing.Notwithstanding the current interest in Bayesian methods, frequentist hypothesis testing remains the default method for statistical inference in medical research. Therefore, we propose an interim solution to the "significance problem" based on simplified Bayesian metrics (e.g., Bayes factor, false positive risk) that can be reported along with traditional P values and confidence intervals. We calculate these metrics for four well-known multicentre trials. We provide links to online calculators so readers can easily estimate these metrics for published trials. In this way, we hope decisions on incorporating the results of randomized trials into clinical practice can be enhanced, minimizing the chance that useful treatments are discarded or that ineffective treatments are adopted.

摘要

随机对照试验是量化医学干预措施有效性的最佳方法之一。因此,当一项随机优效试验的作者报告干预组与对照组之间主要结局的差异“具有统计学意义”(即 P≤0.05)时,我们可能会认为干预措施对结局有影响。同样,当组间差异“不具有统计学意义”时,我们可能会假设干预措施对结局没有影响。然而,这两种假设通常都是不正确的。

在本文中,我们探讨了基于 P 值和置信区间的真实治疗效果与统计显著性声明之间存在的关系。我们解释了为什么在某些情况下,当 P≤0.05 时干预无效的概率超过 25%,而当 P>0.05 时干预有效的概率超过 50%。

在过去的十年中,贝叶斯方法作为一种替代频率派假设检验的方法越来越受到关注。我们提供了一个稳健但非技术性的贝叶斯推理介绍,并解释了为什么贝叶斯后验分布克服了频率派假设检验中存在的许多问题。

尽管目前对贝叶斯方法的兴趣浓厚,但频率派假设检验仍然是医学研究中统计推断的默认方法。因此,我们基于简化的贝叶斯指标(如贝叶斯因子、假阳性风险)提出了一种解决“显著性问题”的临时方案,可以与传统的 P 值和置信区间一起报告。我们计算了四个著名的多中心试验的这些指标。我们提供了在线计算器的链接,以便读者可以轻松地为已发表的试验估计这些指标。通过这种方式,我们希望可以提高将随机试验结果纳入临床实践的决策质量,最大程度地减少有用治疗被抛弃或无效治疗被采用的可能性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66eb/10600289/8e9e72974ef4/12630_2023_2557_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验