Martens Michael J, Lian Qinghua, Geller Nancy L, Leifer Eric S, Logan Brent R
Division of Biostatistics, Medical College of Wisconsin, Milwaukee, WI, USA.
Office of Biostatistics Research, National Heart, Lung, and Blood Institute, Bethesda, MD, USA.
Clin Trials. 2025 Jun;22(3):267-278. doi: 10.1177/17407745241304119. Epub 2024 Dec 29.
Background/aimsSafety monitoring is a crucial requirement for Phase II and III clinical trials. To protect patients from toxicity risk, stopping rules may be implemented that will halt the study if an unexpectedly high number of events occur. These rules are constructed using statistical procedures that typically treat the toxicity data as binary occurrences. Because the exact dates of toxicities are often available, a strategy that handles these as time-to-event data may offer higher power and require less calendar time to identify excess risk. This work investigates several statistical methods for monitoring safety events as time-to-event endpoints and illustrates our R software package for designing and evaluating these procedures.MethodsThe performance metrics of safety stopping rules derived from Wang-Tsiatis tests, Bayesian Gamma-Poisson models, and sequential probability ratio tests are evaluated and contrasted in Phase II and III trial scenarios. We developed a publicly available R package "stoppingrule" for designing and assessing these stopping rules whose utility is illustrated through the design of a stopping rule for Blood and Marrow Transplant Clinical Trials Network 1204 (National Clinical Trial number NCT01998633), a multicenter, Phase II, single-arm trial that assessed the efficacy and safety of bone marrow transplant for the treatment of hemophagocytic lymphohistiocytosis and primary immune deficiencies.ResultsAs seen previously in group sequential testing settings, rules with strict stopping criteria early in a study tend to have more lenient stopping criteria late in the trial. Consequently, methods with aggressive early monitoring, such as Gamma-Poisson models with weak priors and certain choices of truncated sequential probability ratio tests, usually yield a smaller number of toxicities and lower power than ones that are more permissive at early stages, such as Gamma-Poisson models with strong priors and the O'Brien-Fleming test. The Pocock test and maximized sequential probability ratio test performed contrary to these trends, however, exhibiting both diminished power and higher numbers of toxicities than other methods due to their extremely aggressive early stopping criteria, failing to reserve adequate power to identify safety issues beyond the start of the study. In contrast to binary toxicity approaches, our time-to-event methods offer meaningful reductions in expected toxicities of up to 20% across scenarios considered.ConclusionSafety monitoring procedures aim to guard study participants from being exposed to and suffering toxicity from unsafe treatments. Toward this end, we recommend considering the time-to-event-oriented Gamma-Poisson model-weak prior model or truncated sequential probability ratio test for constructing safety stopping rules, as they performed the best in minimizing the number of toxicities in our investigations. Our R package "stoppingrule" offers procedures for creating and assessing stopping rules to aid trial design.
背景/目的
安全性监测是II期和III期临床试验的一项关键要求。为保护患者免受毒性风险,可能会实施停止规则,即在出现意外高数量的事件时停止研究。这些规则是使用统计程序构建的,通常将毒性数据视为二元事件。由于毒性的确切日期通常是可获取的,将这些数据作为事件发生时间数据来处理的策略可能具有更高的检验效能,并且识别额外风险所需的日历时间更少。本研究调查了几种将安全性事件作为事件发生时间终点进行监测的统计方法,并展示了我们用于设计和评估这些程序的R软件包。
方法
在II期和III期试验场景中,对源自Wang-Tsiatis检验、贝叶斯伽马-泊松模型和序贯概率比检验的安全性停止规则的性能指标进行评估和对比。我们开发了一个公开可用的R软件包“stoppingrule”,用于设计和评估这些停止规则,通过为血液和骨髓移植临床试验网络1204(国家临床试验编号NCT01998633)设计一个停止规则来说明其效用,该试验是一项多中心、II期、单臂试验,评估了骨髓移植治疗噬血细胞性淋巴组织细胞增生症和原发性免疫缺陷的疗效和安全性。
结果
如先前在成组序贯检验设置中所见,在研究早期具有严格停止标准的规则在试验后期往往具有更宽松的停止标准。因此,早期监测较为激进的方法,如具有弱先验的伽马-泊松模型和某些截断序贯概率比检验的选择,通常比早期更为宽松的方法(如具有强先验的伽马-泊松模型和O'Brien-Fleming检验)产生的毒性数量更少且检验效能更低。然而,Pocock检验和最大化序贯概率比检验的表现与这些趋势相反;由于其极其激进的早期停止标准,它们的检验效能降低且毒性数量高于其他方法,未能保留足够的效能以识别研究开始后出现的安全性问题。与二元毒性方法相比,我们的事件发生时间方法在所考虑的各种场景中可使预期毒性有意义地降低多达20%。
结论
安全性监测程序旨在保护研究参与者不暴露于不安全治疗并免受其毒性影响。为此,我们建议考虑使用面向事件发生时间的伽马-泊松模型-弱先验模型或截断序贯概率比检验来构建安全性停止规则,因为它们在我们的研究中在最小化毒性数量方面表现最佳。我们的R软件包“stoppingrule”提供了创建和评估停止规则的程序,以辅助试验设计。