Kaiser Permanente Division of Research, Oakland, CA 94612, USA.
Contemp Clin Trials. 2011 Jul;32(4):561-8. doi: 10.1016/j.cct.2011.03.010. Epub 2011 Mar 29.
There is a paucity of literature comparing Bayesian analytic techniques with traditional approaches for analyzing clinical trials using real trial data.
We compared Bayesian and frequentist group sequential methods using data from two published clinical trials. We chose two widely accepted frequentist rules, O'Brien-Fleming and Lan-DeMets, and conjugate Bayesian priors. Using the nonparametric bootstrap, we estimated a sampling distribution of stopping times for each method. Because current practice dictates the preservation of an experiment-wise false positive rate (Type I error), we approximated these error rates for our Bayesian and frequentist analyses with the posterior probability of detecting an effect in a simulated null sample. Thus for the data-generated distribution represented by these trials, we were able to compare the relative performance of these techniques.
No final outcomes differed from those of the original trials. However, the timing of trial termination differed substantially by method and varied by trial. For one trial, group sequential designs of either type dictated early stopping of the study. In the other, stopping times were dependent upon the choice of spending function and prior distribution.
Results indicate that trialists ought to consider Bayesian methods in addition to traditional approaches for analysis of clinical trials. Though findings from this small sample did not demonstrate either method to consistently outperform the other, they did suggest the need to replicate these comparisons using data from varied clinical trials in order to determine the conditions under which the different methods would be most efficient.
使用真实试验数据对临床试验进行分析时,贝叶斯分析技术与传统方法相比的相关文献较少。
我们使用来自两项已发表临床试验的数据比较了贝叶斯和频率派分组序贯方法。我们选择了两种广泛接受的频率派规则,O'Brien-Fleming 和 Lan-DeMets,以及共轭贝叶斯先验。使用非参数自举法,我们估计了每种方法的停止时间的抽样分布。由于当前实践规定了实验错误率(I 型错误)的保留,因此我们使用模拟零样本中检测到效果的后验概率来近似我们的贝叶斯和频率派分析的这些错误率。因此,对于这些试验所代表的数据生成分布,我们能够比较这些技术的相对性能。
没有最终结果与原始试验不同。然而,试验终止的时间因方法和试验而异有很大差异。对于一项试验,任何一种类型的分组序贯设计都规定了研究的早期停止。在另一项试验中,停止时间取决于花费函数和先验分布的选择。
结果表明,试验者应该考虑除传统方法以外的贝叶斯方法来分析临床试验。虽然这个小样本的结果没有证明任何一种方法始终优于另一种方法,但它们确实表明需要使用来自不同临床试验的数据来重复这些比较,以确定不同方法在哪些条件下效率最高。