Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
Department of Emergency Medicine, University of Colorado School of Medicine, Aurora, CO, USA.
Trials. 2024 May 9;25(1):312. doi: 10.1186/s13063-024-08136-3.
Clinical trials often involve some form of interim monitoring to determine futility before planned trial completion. While many options for interim monitoring exist (e.g., alpha-spending, conditional power), nonparametric based interim monitoring methods are also needed to account for more complex trial designs and analyses. The upstrap is one recently proposed nonparametric method that may be applied for interim monitoring.
Upstrapping is motivated by the case resampling bootstrap and involves repeatedly sampling with replacement from the interim data to simulate thousands of fully enrolled trials. The p-value is calculated for each upstrapped trial and the proportion of upstrapped trials for which the p-value criteria are met is compared with a pre-specified decision threshold. To evaluate the potential utility for upstrapping as a form of interim futility monitoring, we conducted a simulation study considering different sample sizes with several different proposed calibration strategies for the upstrap. We first compared trial rejection rates across a selection of threshold combinations to validate the upstrapping method. Then, we applied upstrapping methods to simulated clinical trial data, directly comparing their performance with more traditional alpha-spending and conditional power interim monitoring methods for futility.
The method validation demonstrated that upstrapping is much more likely to find evidence of futility in the null scenario than the alternative across a variety of simulations settings. Our three proposed approaches for calibration of the upstrap had different strengths depending on the stopping rules used. Compared to O'Brien-Fleming group sequential methods, upstrapped approaches had type I error rates that differed by at most 1.7% and expected sample size was 2-22% lower in the null scenario, while in the alternative scenario power fluctuated between 15.7% lower and 0.2% higher and expected sample size was 0-15% lower.
In this proof-of-concept simulation study, we evaluated the potential for upstrapping as a resampling-based method for futility monitoring in clinical trials. The trade-offs in expected sample size, power, and type I error rate control indicate that the upstrap can be calibrated to implement futility monitoring with varying degrees of aggressiveness and that performance similarities can be identified relative to considered alpha-spending and conditional power futility monitoring methods.
临床试验通常涉及某种形式的中期监测,以便在计划完成试验之前确定无效。虽然存在许多中期监测选项(例如,α花费、条件功效),但也需要基于非参数的中期监测方法来考虑更复杂的试验设计和分析。upstrap 是一种最近提出的非参数方法,可用于中期监测。
upstrap 受到案例重采样 bootstrap 的启发,涉及从中期数据中重复进行有放回的抽样,以模拟数千个完全入组的试验。为每个 upstrapped 试验计算 p 值,并将满足 p 值标准的 upstrapped 试验比例与预先指定的决策阈值进行比较。为了评估作为一种中期无效监测形式的 upstrapping 的潜在效用,我们进行了一项模拟研究,考虑了不同的样本量和几种不同的 upstrap 校准策略。我们首先比较了不同阈值组合下的试验拒绝率,以验证 upstrapping 方法的有效性。然后,我们将 upstrapping 方法应用于模拟临床试验数据,直接比较它们与更传统的α花费和条件功效中期监测方法在无效性方面的性能。
方法验证表明,在各种模拟设置下,upstrapping 比替代方案更有可能在零假设下发现无效的证据。我们提出的三种校准 upstrap 的方法具有不同的优势,具体取决于使用的停止规则。与 O'Brien-Fleming 组序贯方法相比,upstrapped 方法的Ⅰ类错误率最多相差 1.7%,在零假设情况下,预期样本量低 2-22%,而在替代假设情况下,功效在低 15.7%和高 0.2%之间波动,预期样本量低 0-15%。
在这项概念验证模拟研究中,我们评估了 upstrapping 作为临床试验中无效监测的基于重采样的方法的潜力。预期样本量、功效和Ⅰ类错误率控制之间的权衡表明,upstrap 可以进行校准,以实施具有不同程度攻击性的无效监测,并且可以相对于考虑的α花费和条件功效无效监测方法确定性能相似性。