Statistics and Epidemiology Unit, Warwick Medical School, University of Warwick, CV4 7AL, Coventry, UK.
Warwick Clinical Trials Unit (WCTU), Warwick Medical School, University of Warwick, CV4 7AL, Coventry, UK.
BMC Med Res Methodol. 2022 Oct 1;22(1):256. doi: 10.1186/s12874-022-01734-2.
Assessing the long term effects of many surgical interventions tested in pragmatic RCTs may require extended periods of participant follow-up to assess effectiveness and use patient-reported outcomes that require large sample sizes. Consequently the RCTs are often perceived as being expensive and time-consuming, particularly if the results show the test intervention is not effective. Adaptive, and particularly group sequential, designs have great potential to improve the efficiency and cost of testing new and existing surgical interventions. As a means to assess the potential utility of group sequential designs, we re-analyse data from a number of recent high-profile RCTs and assess whether using such a design would have caused the trial to stop early.
Many pragmatic RCTs monitor participants at a number of occasions (e.g. at 6, 12 and 24 months after surgery) during follow-up as a means to assess recovery and also to keep participants engaged with the trial process. Conventionally one of the outcomes is selected as the primary (final) outcome, for clinical reasons, with others designated as either early or late outcomes. In such settings, novel group sequential designs that use data from not only the final outcome but also from early outcomes at interim analyses can be used to inform stopping decisions. We describe data from seven recent surgical RCTs (WAT, DRAFFT, WOLLF, FASHION, CSAW, FIXDT, TOPKAT), and outline possible group sequential designs that could plausibly have been proposed at the design stage. We then simulate how these group sequential designs could have proceeded, by using the observed data and dates to replicate how information could have accumulated and decisions been made for each RCT.
The results of the simulated group sequential designs showed that for two of the RCTs it was highly likely that they would have stopped for futility at interim analyses, potentially saving considerable time (15 and 23 months) and costs and avoiding patients being exposed to interventions that were either ineffective or no better than standard care. We discuss the characteristics of RCTs that are important in order to use the methodology we describe, particularly the value of early outcomes and the window of opportunity when early stopping decisions can be made and how it is related to the length of recruitment period and follow-up.
The results for five of the RCTs tested showed that group sequential designs using early outcome data would have been feasible and likely to provide designs that were at least as efficient, and possibly more efficient, than the original fixed sample size designs. In general, the amount of information provided by the early outcomes was surprisingly large, due to the strength of correlations with the primary outcome. This suggests that the methods described here are likely to provide benefits more generally across the range of surgical trials and more widely in other application areas where trial designs, outcomes and follow-up patterns are structured and behave similarly.
评估许多在实用随机对照试验中测试的外科干预措施的长期效果可能需要延长参与者的随访时间,以评估有效性并使用需要大样本量的患者报告结局。因此,这些 RCT 通常被认为既昂贵又耗时,特别是如果结果表明测试干预措施无效。适应性的,特别是分组序贯设计,具有提高新的和现有的外科干预措施的测试效率和降低成本的巨大潜力。作为评估分组序贯设计潜在效用的一种手段,我们重新分析了最近的一些备受瞩目的 RCT 中的数据,并评估了使用这种设计是否会导致试验提前结束。
许多实用的 RCT 在随访期间会在多个时间点(例如手术后 6、12 和 24 个月)监测参与者,以评估恢复情况并让参与者参与试验过程。传统上,出于临床原因,选择一个结局作为主要(最终)结局,而其他结局则被指定为早期或晚期结局。在这种情况下,可以使用新型分组序贯设计,该设计不仅可以使用最终结局的数据,还可以使用中期分析中的早期结局数据,以提供停止决策的信息。我们描述了来自七个最近的外科 RCT(WAT、DRAFFT、WOLLF、FASHION、CSAW、FIXDT、TOPKAT)的数据,并概述了在设计阶段可能提出的合理分组序贯设计。然后,我们通过使用观察到的数据和日期来模拟这些分组序贯设计如何进行,以复制信息如何累积以及每个 RCT 如何做出决策。
模拟分组序贯设计的结果表明,对于其中两个 RCT,在中期分析中极有可能因无效而停止,从而可能节省大量时间(15 和 23 个月)和成本,并避免患者暴露于无效或不比标准护理更好的干预措施。我们讨论了在使用我们描述的方法时重要的 RCT 特征,特别是早期结局的价值以及可以做出早期停止决策的机会窗口,以及它与招募期和随访期的长度的关系。
对五个经过测试的 RCT 的结果表明,使用早期结局数据的分组序贯设计是可行的,并且可能提供至少同样有效,甚至更有效的设计,而不是原始的固定样本量设计。一般来说,由于与主要结局的相关性很强,早期结局提供的信息量非常大。这表明,这里描述的方法很可能在外科试验的广泛范围内以及在其他应用领域(试验设计、结局和随访模式结构相似且行为相似)提供好处。