Rovetta Alessandro, Mansournia Mohammad Ali
International Committee Against the Misuse of Statistical Significance, Bovezzo, Italy.
Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran.
J Prev Med Public Health. 2024 Nov;57(6):511-520. doi: 10.3961/jpmph.24.250. Epub 2024 Sep 20.
Statistical testing in medicine is a controversial and commonly misunderstood topic. Despite decades of efforts by renowned associations and international experts, fallacies such as nullism, the magnitude fallacy, and dichotomania are still widespread within clinical and epidemiological research. This can lead to serious health errors (e.g., misidentification of adverse reactions). In this regard, our work sheds light on another common interpretive and cognitive error: the fallacy of high significance, understood as the mistaken tendency to prioritize findings that lead to low p-values. Indeed, there are target hypotheses (e.g., a hazard ratio of 0.10) for which a high p-value is an optimal and desirable outcome. Accordingly, we propose a novel method that goes beyond mere null hypothesis testing by assessing the statistical surprise of the experimental result compared to the prediction of several target assumptions. Additionally, we formalize the concept of interval hypotheses based on prior information about costs, risks, and benefits for the stakeholders (NORD-h protocol). The incompatibility graph (or surprisal graph) is adopted in this context. Finally, we discuss the epistemic necessity for a descriptive, (quasi) unconditional approach in statistics, which is essential to draw valid conclusions about the consistency of data with all relevant possibilities, including study limitations. Given these considerations, this new protocol has the potential to significantly impact the production of reliable evidence in public health.
医学中的统计检验是一个存在争议且常被误解的话题。尽管知名协会和国际专家历经数十年努力,但诸如虚无主义、效应量谬误和二分法狂热等谬误在临床和流行病学研究中仍广泛存在。这可能导致严重的健康错误(例如,不良反应的错误识别)。在这方面,我们的工作揭示了另一种常见的解释性和认知错误:高显著性谬误,即错误地倾向于优先考虑导致低p值的研究结果。实际上,对于某些目标假设(例如,风险比为0.10),高p值是一个最优且理想的结果。因此,我们提出了一种新颖的方法,该方法通过评估实验结果与几个目标假设预测相比的统计意外性,超越了单纯的零假设检验。此外,我们基于利益相关者的成本、风险和收益的先验信息,将区间假设的概念形式化(NORD-h协议)。在此背景下采用不相容图(或意外性图)。最后,我们讨论了统计学中描述性、(准)无条件方法的认知必要性,这对于就数据与所有相关可能性(包括研究局限性)的一致性得出有效结论至关重要。考虑到这些因素,这个新协议有可能对公共卫生领域可靠证据的产生产生重大影响。