Department of Health, Medicine and Caring Sciences, Division of Society and Health, Linköping, Sweden.
J Med Internet Res. 2020 Aug 27;22(8):e21345. doi: 10.2196/21345.
When should a trial stop? Such a seemingly innocent question evokes concerns of type I and II errors among those who believe that certainty can be the product of uncertainty and among researchers who have been told that they need to carefully calculate sample sizes, consider multiplicity, and not spend P values on interim analyses. However, the endeavor to dichotomize evidence into significant and nonsignificant has led to the basic driving force of science, namely uncertainty, to take a back seat. In this viewpoint we discuss that if testing the null hypothesis is the ultimate goal of science, then we need not worry about writing protocols, consider ethics, apply for funding, or run any experiments at all-all null hypotheses will be rejected at some point-everything has an effect. The job of science should be to unearth the uncertainties of the effects of treatments, not to test their difference from zero. We also show the fickleness of P values, how they may one day point to statistically significant results; and after a few more participants have been recruited, the once statistically significant effect suddenly disappears. We show plots which we hope would intuitively highlight that all assessments of evidence will fluctuate over time. Finally, we discuss the remedy in the form of Bayesian methods, where uncertainty leads; and which allows for continuous decision making to stop or continue recruitment, as new data from a trial is accumulated.
何时应该停止试验?这个看似简单的问题,却引发了那些认为不确定性可以产生确定性的人,以及那些被教导需要仔细计算样本量、考虑多重性、不要在中期分析中使用 P 值的研究人员对 I 类和 II 类错误的担忧。然而,将证据二分法为有意义和无意义的努力,导致了科学的基本驱动力,即不确定性,退居次要地位。在这篇观点文章中,我们讨论了如果检验零假设是科学的最终目标,那么我们就不必担心编写方案、考虑伦理、申请资金或进行任何实验——所有的零假设都会在某个时候被拒绝——一切都有影响。科学的工作应该是揭示治疗效果的不确定性,而不是检验其与零的差异。我们还展示了 P 值的多变性,它们如何有一天会指向具有统计学意义的结果;在招募了更多的参与者之后,曾经具有统计学意义的效果突然消失了。我们展示了一些图表,希望这些图表能够直观地说明,所有对证据的评估都会随着时间的推移而波动。最后,我们讨论了以贝叶斯方法为形式的补救措施,其中不确定性占主导地位;并且允许根据试验的新数据进行持续的决策,以停止或继续招募。