• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

统计功效的误用。

Inappropriate use of statistical power.

机构信息

Medical College of Wisconsin, Milwaukee, WI, USA.

出版信息

Bone Marrow Transplant. 2023 May;58(5):474-477. doi: 10.1038/s41409-023-01935-3. Epub 2023 Mar 3.

DOI:10.1038/s41409-023-01935-3
PMID:36869191
Abstract

We are pleased to add this typescript, Inappropriate use of statistical power by Raphael Fraser to the BONE MARROW TRANSPLANTATION Statistics Series. The authour discusses how we sometimes misuse statistical analyses after a study is completed and analyzed to explain the results. The most egregious example is post hoc power calculations.When the conclusion of an observational study or clinical trial is negative, namely, the data observed (or more extreme data) fail to reject the null hypothesis, people often argue for calculating the observed statistical power. This is especially true of clinical trialists believing in a new therapy who wished and hoped for a favorable outcome (rejecting the null hypothesis). One is reminded of the saying from Benjamin Franklin: A man convinced against his will is of the same opinion still.As the authour notes, when we face a negative conclusion of a clinical trial there are two possibilities: (1) there is no treatment effect; or (2) we made a mistake. By calculating the observed power after the study, people (incorrectly) believe if the observed power is high there is strong support for the null hypothesis. However, the problem is usually the opposite: if the observed power is low, the null hypothesis was not rejected because there were too few subjects. This is usually couched in terms such as: there was a trend towards… or we failed to detect a benefit because we had too few subjects or the like. Observed power should not be used to interpret results of a negative study. Put more strongly, observed power should not be calculated after a study is completed and analyzed. The power of the study to reject or not the null hypothesis is already incorporated in the calculation of the p value.The authour use interesting analogies to make important points about hypothesis testing. Testing the null hypothesis is like a jury trial. The jury can find the plaintiff guilty or not guilty. They cannot find him innocent. It is always important to recall failure to reject the null hypothesis does not mean the null hypothesis is true, simply there are insufficient evidence (data) to reject it. As the author notes: In a sense, hypothesis testing is like world championship boxing where the null hypothesis is the champion until defeated by the challenger, the alternative hypothesis, to become the new world champion.The authour include a discussion of what is a p-value, a topic we discussed before in this series and elsewhere [1, 2]. Finally, there is a nice discussion of confidence intervals (frequentist) and credibility limits (Bayesian). A frequentist interpretation views probability as the limit of the relative frequency of an event after many trials. In contrast, a Bayesian interpretation views probability in the context of a degree of belief in an event . This belief could be based on prior knowledge such as the results of previous trials, biological plausibility or personal beliefs (my drug is better than your drug). The important point is the common mis-interpretation of confidence intervals. For example, many researchers interpret a 95 percent confidence interval to mean there is a 95 percent chance this interval contains the parameter value. This is wrong. It means, if we repeat the identical study many times 95 percent of the intervals will contain the true but unknown parameter in the population. This will seem strange to many people because we are interested only in the study we are analyzing, not in repeating the same study-design many times.We hope readers will enjoy this well-written summary of common statistical errors, especially post hoc calculations of observed power. Going forth we hope to ban statements like there was a trend towards… or we failed to detect a benefit because we had too few subjects from the Journal. Reviewers have been advised. Proceed at your own risk. Robert Peter Gale MD, PhD, DSc(hc), FACP, FRCP, FRCPI(hon), FRSM, Imperial College London, Mei-Jie Zhang PhD, Medical College of Wisconsin.

摘要

我们很高兴将 Raphael Fraser 的《不恰当地使用统计功效》这篇专题论文纳入《骨髓移植统计学系列》。作者讨论了我们有时如何在研究完成并进行分析后,为了解释结果而错误地使用统计分析。最恶劣的例子是事后功效计算。当观察性研究或临床试验的结论为阴性时,即观察到的数据(或更极端的数据)未能拒绝零假设,人们通常会主张计算观察到的统计功效。对于相信新疗法的临床试验人员来说,尤其如此,他们希望并希望有一个有利的结果(拒绝零假设)。本杰明·富兰克林(Benjamin Franklin)有句话说得好:人在不情愿的情况下被说服,其意见仍然不变。正如作者所指出的,当我们面对临床试验的负面结论时,有两种可能性:(1)没有治疗效果;或(2)我们犯了一个错误。通过在研究后计算观察到的功效,人们(错误地)认为,如果观察到的功效很高,那么对零假设的支持就很强。然而,问题通常恰恰相反:如果观察到的功效较低,那么零假设没有被拒绝,因为受试者太少。这通常被表述为:有……的趋势,或者我们由于受试者太少而未能检测到益处等等。不应该使用观察到的功效来解释阴性研究的结果。更强烈地说,不应该在研究完成并进行分析后计算观察到的功效。研究拒绝或不拒绝零假设的功效已经包含在 p 值的计算中。

作者使用有趣的类比来阐明关于假设检验的重要观点。检验零假设就像陪审团审判一样。陪审团可以判定原告有罪或无罪。他们不能判定他无罪。重要的是,要始终记住,未能拒绝零假设并不意味着零假设是正确的,只是没有足够的证据(数据)来拒绝它。正如作者指出的:从某种意义上说,假设检验就像世界冠军拳击赛一样,在被挑战者,即替代假设击败之前,零假设是冠军,成为新的世界冠军。

作者包括了对 p 值的讨论,这是我们之前在本系列和其他地方[1,2]讨论过的话题。最后,还有关于置信区间(频率主义者)和可信度限制(贝叶斯主义者)的精彩讨论。频率主义者将概率视为在多次试验后事件的相对频率的极限。相比之下,贝叶斯解释将概率置于对事件的置信度的背景下。这种信念可以基于先前的知识,如先前试验的结果、生物学合理性或个人信仰(我的药物比你的药物好)。重要的是,对置信区间的常见误解。例如,许多研究人员将 95%置信区间解释为该区间包含参数值的可能性为 95%。这是错误的。这意味着,如果我们重复相同的研究很多次,那么 95%的区间将包含人群中真实但未知的参数。这对许多人来说似乎很奇怪,因为我们只对我们正在分析的研究感兴趣,而不是对重复相同的研究设计感兴趣。

我们希望读者会喜欢这篇关于常见统计错误的精彩总结,特别是事后计算观察到的功效。今后,我们希望从杂志上禁止出现“有……的趋势”或“由于我们的受试者太少,我们未能检测到益处”之类的说法。审稿人已经得到了建议。请自行承担风险。Robert Peter Gale MD, PhD, DSc(hc), FACP, FRCP, FRCPI(hon), FRSM,伦敦帝国理工学院,Mei-Jie Zhang PhD,威斯康星医学院。

相似文献

1
Inappropriate use of statistical power.统计功效的误用。
Bone Marrow Transplant. 2023 May;58(5):474-477. doi: 10.1038/s41409-023-01935-3. Epub 2023 Mar 3.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
[Standard technical specifications for methacholine chloride (Methacholine) bronchial challenge test (2023)].[氯化乙酰甲胆碱支气管激发试验标准技术规范(2023年)]
Zhonghua Jie He He Hu Xi Za Zhi. 2024 Feb 12;47(2):101-119. doi: 10.3760/cma.j.cn112147-20231019-00247.
4
Quantile regression for censored data in haematopoietic cell transplant research.截尾数据的分位数回归在造血细胞移植研究中的应用。
Bone Marrow Transplant. 2022 Jun;57(6):853-856. doi: 10.1038/s41409-022-01627-4. Epub 2022 Mar 24.
5
Precision medicine: Statistical methods for estimating adaptive treatment strategies.精准医学:估计适应性治疗策略的统计方法。
Bone Marrow Transplant. 2020 Oct;55(10):1890-1896. doi: 10.1038/s41409-020-0871-z. Epub 2020 Apr 14.
6
To P or Not to P: Backing Bayesian Statistics.支持还是反对P值:支持贝叶斯统计。
Otolaryngol Head Neck Surg. 2017 Dec;157(6):915-918. doi: 10.1177/0194599817739260.
7
P value and the theory of hypothesis testing: an explanation for new researchers.P 值与假设检验理论:对新研究人员的解释。
Clin Orthop Relat Res. 2010 Mar;468(3):885-92. doi: 10.1007/s11999-009-1164-4.
8
Statistics in ophthalmology revisited: the (effect) size matters.眼科统计学再探:(效应)大小很重要。
Acta Ophthalmol. 2018 Nov;96(7):e885-e888. doi: 10.1111/aos.13756. Epub 2018 Sep 5.
9
Biostatistics Series Module 2: Overview of Hypothesis Testing.生物统计学系列模块2:假设检验概述。
Indian J Dermatol. 2016 Mar-Apr;61(2):137-45. doi: 10.4103/0019-5154.177775.
10
Cubic splines to model relationships between continuous variables and outcomes: a guide for clinicians.用于模拟连续变量与结果之间关系的三次样条曲线:临床医生指南
Bone Marrow Transplant. 2020 Apr;55(4):675-680. doi: 10.1038/s41409-019-0679-x. Epub 2019 Oct 1.

引用本文的文献

1
Real-world effectiveness of a single conventional disease-modifying anti-rheumatic drug (cDMARD) plus an anti-TNF agent versus multiple cDMARDs in rheumatoid arthritis: a prospective observational study.单种传统改善病情抗风湿药(cDMARD)联合抗TNF药物与多种cDMARDs治疗类风湿关节炎的真实世界疗效:一项前瞻性观察研究
J Rheum Dis. 2024 Apr 1;31(2):86-96. doi: 10.4078/jrd.2023.0045. Epub 2024 Jan 29.
2
Power analyses for measurement model misspecification and response shift detection with structural equation modeling.使用结构方程模型进行测量模型错误设定和反应偏移检测的功效分析。
Qual Life Res. 2024 May;33(5):1241-1256. doi: 10.1007/s11136-024-03605-3. Epub 2024 Mar 1.
3

本文引用的文献

1
What is the (p-) value of the P-value?P值的(p-)值是什么?
Leukemia. 2016 Oct;30(10):1965-1967. doi: 10.1038/leu.2016.193. Epub 2016 Aug 26.
2
What is the P-value anyway?P值到底是什么?
Bone Marrow Transplant. 2016 Nov;51(11):1439-1440. doi: 10.1038/bmt.2016.184. Epub 2016 Jul 11.
3
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.统计检验、P 值、置信区间与检验效能:误解指南
The relationship between obesity associated weight-adjusted waist index and the prevalence of hypertension in US adults aged ≥60 years: a brief report.
肥胖相关体重调整腰围指数与美国≥60 岁成年人高血压患病率的关系:简要报告。
Front Public Health. 2023 Oct 6;11:1210669. doi: 10.3389/fpubh.2023.1210669. eCollection 2023.
Eur J Epidemiol. 2016 Apr;31(4):337-50. doi: 10.1007/s10654-016-0149-3. Epub 2016 May 21.
4
Nonsignificance plus high power does not imply support for the null over the alternative.功效不显著加上效能高并不意味着支持虚无假设而不支持备择假设。
Ann Epidemiol. 2012 May;22(5):364-8. doi: 10.1016/j.annepidem.2012.02.007. Epub 2012 Mar 3.
5
The statistical power of abnormal-social psychological research: a review.异常社会心理学研究的统计功效:一项综述。
J Abnorm Soc Psychol. 1962 Sep;65:145-53. doi: 10.1037/h0045186.
6
Power is indeed irrelevant in interpreting completed studies.在解释已完成的研究时,功效确实无关紧要。
BMJ. 2002 Nov 30;325(7375):1304.