Analytix Thinking, LLC, Indianapolis, Indiana, USA.
Clin Pharmacol Ther. 2021 Jun;109(6):1489-1498. doi: 10.1002/cpt.2004. Epub 2020 Sep 26.
Null hypothesis significance testing (NHST) with its benchmark P value < 0.05 has long been a stalwart of scientific reporting and such statistically significant findings have been used to imply scientifically or clinically significant findings. Challenges to this approach have arisen over the past 6 decades, but they have largely been unheeded. There is a growing movement for using Bayesian statistical inference to quantify the probability that a scientific finding is credible. There have been differences of opinion between the frequentist (i.e., NHST) and Bayesian schools of inference, and warnings about the use or misuse of P values have come from both schools of thought spanning many decades. Controversies in this arena have been heightened by the American Statistical Association statement on P values and the further denouncement of the term "statistical significance" by others. My experience has been that many scientists, including many statisticians, do not have a sound conceptual grasp of the fundamental differences in these approaches, thereby creating even greater confusion and acrimony. If we let A represent the observed data, and B represent the hypothesis of interest, then the fundamental distinction between these two approaches can be described as the frequentist approach using the conditional probability pr(A | B) (i.e., the P value), and the Bayesian approach using pr(B | A) (the posterior probability). This paper will further explain the fundamental differences in NHST and Bayesian approaches and demonstrate how they can co-exist harmoniously to guide clinical trial design and inference.
零假设显著性检验(NHST)及其基准 P 值 < 0.05 长期以来一直是科学报告的坚定支持者,这种具有统计学意义的发现被用来暗示具有科学或临床意义的发现。尽管过去 60 年来对这种方法提出了挑战,但它们在很大程度上被忽视了。越来越多的人倾向于使用贝叶斯统计推断来量化科学发现的可信度的概率。频率派(即 NHST)和贝叶斯推理学派之间存在意见分歧,并且关于 P 值的使用或误用的警告来自于跨越多个十年的两种思潮。这个领域的争议因美国统计协会关于 P 值的声明以及其他人对“统计显著性”一词的进一步谴责而加剧。我的经验是,许多科学家,包括许多统计学家,对这些方法的基本差异没有一个正确的概念理解,从而造成了更大的混乱和敌意。如果我们让 A 代表观测数据,B 代表感兴趣的假设,那么这两种方法的基本区别可以描述为频率派方法使用条件概率 pr(A | B)(即 P 值),而贝叶斯方法使用 pr(B | A)(后验概率)。本文将进一步解释 NHST 和贝叶斯方法的基本差异,并演示它们如何和谐共存,以指导临床试验设计和推断。