Greenland Sander
Department of Epidemiology and Department of Statistics, University of California, Los Angeles, CA, USA.
Paediatr Perinat Epidemiol. 2021 Jan;35(1):8-23. doi: 10.1111/ppe.12711. Epub 2020 Dec 2.
The "replication crisis" has been attributed to perverse incentives that lead to selective reporting and misinterpretations of P-values and confidence intervals. A crude fix offered for this problem is to lower testing cut-offs (α levels), either directly or in the form of null-biased multiple comparisons procedures such as naïve Bonferroni adjustments. Methodologists and statisticians have expressed positions that range from condemning all such procedures to demanding their application in almost all analyses. Navigating between these unjustifiable extremes requires defining analysis goals precisely enough to separate inappropriate from appropriate adjustments. To meet this need, I here review issues arising in single-parameter inference (such as error costs and loss functions) that are often skipped in basic statistics, yet are crucial to understanding controversies in testing and multiple comparisons. I also review considerations that should be made when examining arguments for and against modifications of decision cut-offs and adjustments for multiple comparisons. The goal is to provide researchers a better understanding of what is assumed by each side and to enable recognition of hidden assumptions. Basic issues of goal specification and error costs are illustrated with simple fixed cut-off hypothesis testing scenarios. These illustrations show how adjustment choices are extremely sensitive to implicit decision costs, making it inevitable that different stakeholders will vehemently disagree about what is necessary or appropriate. Because decisions cannot be justified without explicit costs, resolution of inference controversies is impossible without recognising this sensitivity. Pre-analysis statements of funding, scientific goals, and analysis plans can help counter demands for inappropriate adjustments, and can provide guidance as to what adjustments are advisable. Hierarchical (multilevel) regression methods (including Bayesian, semi-Bayes, and empirical-Bayes methods) provide preferable alternatives to conventional adjustments, insofar as they facilitate use of background information in the analysis model, and thus can provide better-informed estimates on which to base inferences and decisions.
“复制危机”被归因于不正当的激励措施,这些措施导致了对P值和置信区间的选择性报告和错误解读。针对这个问题提出的一个粗略解决办法是直接降低检验临界值(α水平),或者采用如朴素邦费罗尼校正等偏向零假设的多重比较程序。方法学家和统计学家表达了各种各样的立场,从谴责所有此类程序到要求在几乎所有分析中都应用它们。在这些不合理的极端立场之间找到平衡,需要精确地定义分析目标,以便区分不适当的调整和适当的调整。为了满足这一需求,我在此回顾单参数推断中出现的问题(如误差成本和损失函数),这些问题在基础统计学中常常被忽略,但对于理解检验和多重比较中的争议至关重要。我还回顾了在审视支持和反对修改决策临界值及多重比较调整的论点时应考虑的因素。目标是让研究人员更好地理解各方的假设,并能够识别隐藏的假设。目标设定和误差成本的基本问题通过简单的固定临界值假设检验场景进行说明。这些示例表明,调整选择对隐含的决策成本极其敏感,这使得不同的利益相关者不可避免地会对什么是必要的或适当的产生激烈分歧。因为没有明确的成本就无法证明决策的合理性,所以不认识到这种敏感性就不可能解决推断争议。预先分析的资金、科学目标和分析计划声明有助于抵制对不适当调整的要求,并能为哪些调整是可取的提供指导。分层(多级)回归方法(包括贝叶斯、半贝叶斯和经验贝叶斯方法)提供了优于传统调整的替代方法,因为它们便于在分析模型中使用背景信息,从而可以提供更有依据的估计,以此为推断和决策奠定基础。