Otsuka America Pharmaceutical, Inc, Rockville, Maryland.
Merck & Co, Inc, Kenilworth, New Jersey.
JAMA Oncol. 2022 Sep 1;8(9):1294-1300. doi: 10.1001/jamaoncol.2022.2666.
The log-rank test is considered the criterion standard for comparing 2 survival curves in pivotal registrational trials. However, with novel immunotherapies that often violate the proportional hazards assumptions over time, log-rank can lose power and may fail to detect treatment benefit. The MaxCombo test, a combination of weighted log-rank tests, retains power under different types of nonproportional hazards. The difference in restricted mean survival time (dRMST) test is frequently proposed as an alternative to the log-rank under nonproportional hazard scenarios.
To compare the log-rank with the MaxCombo and dRMST in immuno-oncology trials to evaluate their performance in practice.
Comprehensive literature review using Google Scholar, PubMed, and other sources for randomized clinical trials published in peer-reviewed journals or presented at major clinical conferences before December 2019 assessing efficacy of anti-programmed cell death protein-1 or anti-programmed death/ligand 1 monoclonal antibodies.
Pivotal studies with overall survival or progression-free survival as the primary or key secondary end point with a planned statistical comparison in the protocol. Sixty-three studies on anti-programmed cell death protein-1 or anti-programmed death/ligand 1 monoclonal antibodies used as monotherapy or in combination with other agents in 35 902 patients across multiple solid tumor types were identified.
Statistical comparisons (n = 150) were made between the 3 tests using the analysis populations as defined in the original protocol of each trial.
Nominal significance based on a 2-sided .05-level test was used to evaluate concordance. Case studies featuring different types of nonproportional hazards were used to discuss more robust ways of characterizing treatment benefit instead of sole reliance on hazard ratios.
In this systematic review and meta-analysis of 63 studies including 35 902 patients, between the log-rank and MaxCombo, 135 of 150 comparisons (90%) were concordant; MaxCombo achieved nominal significance in 15 of 15 discordant cases, while log-rank did not. Several cases appeared to have clinically meaningful benefits that would not have been detected using log-rank. Between the log-rank and dRMST tests, 137 of 150 comparisons (91%) were concordant; log-rank was nominally significant in 5 of 13 cases, while dRMST was significant in 8 of 13. Among all 3 tests, 127 comparisons (85%) were concordant.
The findings of this review show that MaxCombo may provide a pragmatic alternative to log-rank when departure from proportional hazards is anticipated. Both tests resulted in the same statistical decision in most comparisons. Discordant studies had modest to meaningful improvements in treatment effect. The dRMST test provided no added sensitivity for detecting treatment differences over log-rank.
对数秩检验被认为是在关键注册试验中比较 2 条生存曲线的标准方法。然而,随着新型免疫疗法常常随时间违反比例风险假设,对数秩检验可能会失去效力,并可能无法检测到治疗益处。MaxCombo 检验是加权对数秩检验的组合,在不同类型的非比例风险下保持了效力。受限平均生存时间(dRMST)检验经常被提议作为非比例风险情况下对数秩检验的替代方法。
在免疫肿瘤学试验中比较对数秩检验、MaxCombo 检验和 dRMST 检验,以评估它们在实践中的性能。
使用 Google Scholar、PubMed 和其他资源进行全面文献综述,检索 2019 年 12 月前在同行评议期刊上发表或在主要临床会议上发表的评估抗程序性细胞死亡蛋白-1 或抗程序性死亡配体 1 单克隆抗体疗效的随机临床试验。
主要终点为总生存期或无进展生存期,或为关键性次要终点,且方案中计划进行统计学比较的关键性试验。在 35902 例患有多种实体瘤的患者中,共确定了 63 项关于抗程序性细胞死亡蛋白-1 或抗程序性死亡配体 1 单克隆抗体作为单药或联合其他药物使用的临床试验,这些研究均使用了无进展生存期或总生存期作为主要或关键次要终点。
使用每个试验原始方案中定义的分析人群,对 3 种检验方法进行了统计比较(n=150)。
采用 2 边.05 水平检验的名义显著性来评估一致性。对不同类型的非比例风险的案例研究,讨论了更稳健的描述治疗益处的方法,而不仅仅是依赖风险比。
在这项对 63 项研究(包括 35902 例患者)的系统回顾和荟萃分析中,对数秩检验和 MaxCombo 检验之间的 150 次比较中有 135 次(90%)是一致的;在 15 次不一致的情况下,MaxCombo 检验都达到了名义显著性,而对数秩检验没有。有几个病例似乎具有有临床意义的益处,而对数秩检验则无法检测到这些益处。在对数秩检验和 dRMST 检验之间,150 次比较中有 137 次(91%)是一致的;在 5 次不一致的情况下,对数秩检验具有名义显著性,而在 8 次不一致的情况下,dRMST 检验具有显著性。在所有 3 种检验方法中,有 127 次比较(85%)是一致的。
本研究结果表明,当预期出现偏离比例风险时,MaxCombo 检验可能是对数秩检验的一种实用替代方法。在大多数比较中,这两种检验方法都得到了相同的统计决策。不一致的研究在治疗效果上有适度到有意义的改善。dRMST 检验在检测治疗差异方面没有比对数秩检验提供更高的敏感性。