分析目标、误差成本敏感性与分析操纵：假设检验和多重比较中的重要考量因素

Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons.

作者信息

Greenland Sander

机构信息

Department of Epidemiology and Department of Statistics, University of California, Los Angeles, CA, USA.

出版信息

Paediatr Perinat Epidemiol. 2021 Jan;35(1):8-23. doi: 10.1111/ppe.12711. Epub 2020 Dec 2.

DOI:10.1111/ppe.12711

PMID:33269490

Abstract

The "replication crisis" has been attributed to perverse incentives that lead to selective reporting and misinterpretations of P-values and confidence intervals. A crude fix offered for this problem is to lower testing cut-offs (α levels), either directly or in the form of null-biased multiple comparisons procedures such as naïve Bonferroni adjustments. Methodologists and statisticians have expressed positions that range from condemning all such procedures to demanding their application in almost all analyses. Navigating between these unjustifiable extremes requires defining analysis goals precisely enough to separate inappropriate from appropriate adjustments. To meet this need, I here review issues arising in single-parameter inference (such as error costs and loss functions) that are often skipped in basic statistics, yet are crucial to understanding controversies in testing and multiple comparisons. I also review considerations that should be made when examining arguments for and against modifications of decision cut-offs and adjustments for multiple comparisons. The goal is to provide researchers a better understanding of what is assumed by each side and to enable recognition of hidden assumptions. Basic issues of goal specification and error costs are illustrated with simple fixed cut-off hypothesis testing scenarios. These illustrations show how adjustment choices are extremely sensitive to implicit decision costs, making it inevitable that different stakeholders will vehemently disagree about what is necessary or appropriate. Because decisions cannot be justified without explicit costs, resolution of inference controversies is impossible without recognising this sensitivity. Pre-analysis statements of funding, scientific goals, and analysis plans can help counter demands for inappropriate adjustments, and can provide guidance as to what adjustments are advisable. Hierarchical (multilevel) regression methods (including Bayesian, semi-Bayes, and empirical-Bayes methods) provide preferable alternatives to conventional adjustments, insofar as they facilitate use of background information in the analysis model, and thus can provide better-informed estimates on which to base inferences and decisions.

摘要

“复制危机”被归因于不正当的激励措施，这些措施导致了对P值和置信区间的选择性报告和错误解读。针对这个问题提出的一个粗略解决办法是直接降低检验临界值（α水平），或者采用如朴素邦费罗尼校正等偏向零假设的多重比较程序。方法学家和统计学家表达了各种各样的立场，从谴责所有此类程序到要求在几乎所有分析中都应用它们。在这些不合理的极端立场之间找到平衡，需要精确地定义分析目标，以便区分不适当的调整和适当的调整。为了满足这一需求，我在此回顾单参数推断中出现的问题（如误差成本和损失函数），这些问题在基础统计学中常常被忽略，但对于理解检验和多重比较中的争议至关重要。我还回顾了在审视支持和反对修改决策临界值及多重比较调整的论点时应考虑的因素。目标是让研究人员更好地理解各方的假设，并能够识别隐藏的假设。目标设定和误差成本的基本问题通过简单的固定临界值假设检验场景进行说明。这些示例表明，调整选择对隐含的决策成本极其敏感，这使得不同的利益相关者不可避免地会对什么是必要的或适当的产生激烈分歧。因为没有明确的成本就无法证明决策的合理性，所以不认识到这种敏感性就不可能解决推断争议。预先分析的资金、科学目标和分析计划声明有助于抵制对不适当调整的要求，并能为哪些调整是可取的提供指导。分层（多级）回归方法（包括贝叶斯、半贝叶斯和经验贝叶斯方法）提供了优于传统调整的替代方法，因为它们便于在分析模型中使用背景信息，从而可以提供更有依据的估计，以此为推断和决策奠定基础。

相似文献

Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons.分析目标、误差成本敏感性与分析操纵：假设检验和多重比较中的重要考量因素

Paediatr Perinat Epidemiol. 2021 Jan;35(1):8-23. doi: 10.1111/ppe.12711. Epub 2020 Dec 2.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Is There a Free Lunch in Inference?推理中存在免费的午餐吗？

Top Cogn Sci. 2016 Jul;8(3):520-47. doi: 10.1111/tops.12214.

Empirical Bayes adjustments for multiple results in hypothesis-generating or surveillance studies.在假设生成或监测研究中对多个结果进行经验贝叶斯调整。

Cancer Epidemiol Biomarkers Prev. 2000 Sep;9(9):895-903.

Innovations in bayes and empirical bayes methods: estimating parameters, populations and ranks.贝叶斯方法和经验贝叶斯方法的创新：估计参数、总体和秩。

Stat Med. 1999;18(17-18):2493-505. doi: 10.1002/(sici)1097-0258(19990915/30)18:17/18<2493::aid-sim271>3.0.co;2-s.

Empirical Bayes screening of many p-values with applications to microarray studies.用于微阵列研究的多p值经验贝叶斯筛选。

Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2.

Empirical-Bayes adjustments for multiple comparisons are sometimes useful.用于多重比较的经验贝叶斯调整有时是有用的。

Epidemiology. 1991 Jul;2(4):244-51. doi: 10.1097/00001648-199107000-00002.

Efficiency in sequential testing: Comparing the sequential probability ratio test and the sequential Bayes factor test.序贯检验的效率：比较序贯似然比检验和序贯贝叶斯因子检验。

Behav Res Methods. 2022 Dec;54(6):3100-3117. doi: 10.3758/s13428-021-01754-8. Epub 2022 Mar 1.

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective.贝叶斯新统计：从贝叶斯视角看假设检验、估计、元分析和功效分析。

Psychon Bull Rev. 2018 Feb;25(1):178-206. doi: 10.3758/s13423-016-1221-4.

Null hypothesis significance testing and effect sizes: can we 'effect' everything … or … anything?假设检验和效应量：我们是否可以“影响”一切……或……任何事物？

Curr Opin Pharmacol. 2020 Apr;51:68-77. doi: 10.1016/j.coph.2019.12.001. Epub 2020 Jan 14.

引用本文的文献

Clinical Decision Support System for Primary Care of Opioid Use Disorder: A Randomized Clinical Trial.阿片类物质使用障碍基层医疗临床决策支持系统：一项随机临床试验

JAMA Intern Med. 2025 Jul 14. doi: 10.1001/jamainternmed.2025.2535.

Engagement in Ageless Gym Programs Among Older Adults in Rural Communities: A Retrospective Study on Relationships With Age, Health Conditions, and Proximity to Health Facilities.农村社区老年人参与无龄健身项目：一项关于与年龄、健康状况及与医疗设施距离关系的回顾性研究

J Aging Res. 2025 Jun 23;2025:2608531. doi: 10.1155/jare/2608531. eCollection 2025.

Associations between temporal lobe cortical NODDI measures and memory function in individuals without clinical dementia.无临床痴呆个体颞叶皮质神经突方向离散度成像测量值与记忆功能之间的关联

Alzheimers Dement. 2025 Jun;21(6):e70384. doi: 10.1002/alz.70384.

An exploratory study on the use of sexually transmitted infection prevention and contraception methods among women and men who use unprescribed opioids.一项针对使用非处方阿片类药物的女性和男性使用性传播感染预防及避孕方法的探索性研究。

Drug Alcohol Depend Rep. 2025 Apr 22;15:100337. doi: 10.1016/j.dadr.2025.100337. eCollection 2025 Jun.

When to adjust for multiplicity in cancer clinical trials.癌症临床试验中何时进行多重性调整。

J Natl Cancer Inst Monogr. 2025 Mar 1;2025(68):3-9. doi: 10.1093/jncimonographs/lgae051.

Red Blood Cell-Related Phenotype-Genotype Correlations in Chronic and Acute Critical Illnesses (Traumatic Brain Injury Cohort and COVID-19 Cohort).慢性和急性危重症（创伤性脑损伤队列和COVID-19队列）中红细胞相关表型-基因型的相关性

Int J Mol Sci. 2025 Jan 31;26(3):1239. doi: 10.3390/ijms26031239.

P>0.05 Is Good: The NORD-h Protocol for Several Hypothesis Analysis Based on Known Risks, Costs, and Benefits.P>0.05 是有益的：基于已知风险、成本和效益的几种假设分析的NORD-h方案。

J Prev Med Public Health. 2024 Nov;57(6):511-520. doi: 10.3961/jpmph.24.250. Epub 2024 Sep 20.

Can expected error costs justify testing a hypothesis at multiple alpha levels rather than searching for an elusive optimal alpha?可以用预期误差成本来证明在多个 α 水平上检验假设是合理的，而不是寻找难以捉摸的最优 α 吗？

PLoS One. 2024 Sep 25;19(9):e0304675. doi: 10.1371/journal.pone.0304675. eCollection 2024.

Pseudo-random Number Generator Influences on Average Treatment Effect Estimates Obtained with Machine Learning.伪随机数生成器对机器学习得到的平均处理效应估计的影响。

Epidemiology. 2024 Nov 1;35(6):779-786. doi: 10.1097/EDE.0000000000001785. Epub 2024 Aug 16.

For a proper use of frequentist inferential statistics in public health.关于在公共卫生中正确使用频率学派推断统计学。

Glob Epidemiol. 2024 Jun 15;8:100151. doi: 10.1016/j.gloepi.2024.100151. eCollection 2024 Dec.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

分析目标、误差成本敏感性与分析操纵：假设检验和多重比较中的重要考量因素

Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献