Sidebotham David, Barlow C Jake
Department of Anaesthesia, Auckland City Hospital, Auckland, New Zealand.
Cardiothoracic and Vascular Intensive Care Unit, Auckland City Hospital, Auckland, New Zealand.
BJA Open. 2022 Mar 1;1:100003. doi: 10.1016/j.bjao.2022.100003. eCollection 2022 Mar.
In medical research, null hypothesis significance testing (NHST) is the dominant framework for statistical inference. NHST involves calculating -values and confidence intervals to quantify the evidence against the null hypothesis of no effect. However, -values and confidence intervals cannot tell us the probability that the hypothesis is true. In contrast, false-positive risk (FPR) and false-negative risk (FNR) are post-test probabilities concerning the truth of the hypothesis, that is to say, the probability a real effect exists.
We calculated the FPR or FNR for 53 individual multicentre trials in critical care based on a pretest probability of 0.5 that the hypothesis was true.
For trials reporting statistical significance, the FPR varied between 0.1% and 57.6%. For trials reporting non-significance, the FNR varied between 1.7% and 36.9%. Twenty-six of 47 trials (55.3%) reporting non-significance provided strong or very strong evidence in favour of the null hypothesis; the remaining trials provided limited evidence. There was no obvious relationship between the -value and the FNR.
The FPR and FNR showed marked variability, indicating that the probability of a real or absent treatment effect differed substantially between trials. Only one trial reporting statistical significance provided convincing evidence of a real treatment effect, and nearly half of all trials reporting non-significance provided limited evidence for the absence of a treatment effect. Our findings suggest that the quality of evidence from multicentre trials in critical care is highly variable.
在医学研究中,零假设显著性检验(NHST)是统计推断的主导框架。NHST涉及计算P值和置信区间,以量化反对无效应零假设的证据。然而,P值和置信区间无法告诉我们假设为真的概率。相比之下,假阳性风险(FPR)和假阴性风险(FNR)是关于假设真实性的检验后概率,也就是说,存在真实效应的概率。
我们基于假设为真的先验概率0.5,计算了53项重症监护多中心个体试验的FPR或FNR。
对于报告具有统计学显著性的试验,FPR在0.1%至57.6%之间变化。对于报告无显著性的试验,FNR在1.7%至36.9%之间变化。在报告无显著性的47项试验中,有26项(55.3%)提供了支持零假设的强或非常强的证据;其余试验提供的证据有限。P值与FNR之间没有明显关系。
FPR和FNR显示出显著的变异性,表明不同试验之间真实治疗效应存在或不存在的概率有很大差异。只有一项报告具有统计学显著性的试验提供了真实治疗效应的令人信服的证据,而几乎所有报告无显著性的试验中有近一半提供了治疗效应不存在的有限证据。我们的研究结果表明,重症监护多中心试验的证据质量高度可变。