Sherry Alexander D, Liu Yufei, Msaouel Pavlos, Lin Timothy A, Koong Alex, Lin Christine, Jaoude Joseph Abi, Patel Roshal R, Kouzy Ramez, El-Alam Molly B, Miller Avital M, Owiwi Mohannad, Ofer Jonathan, Bomze David, McCaw Zachary R, Meirson Tomer, Ludmir Ethan B
Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Department of Radiation Oncology, City of Hope National Medical Center, Duarte, CA, USA.
medRxiv. 2025 Jan 13:2025.01.11.25320398. doi: 10.1101/2025.01.11.25320398.
Statistical significance currently defines superiority in phase III oncology trials. However, this practice is increasingly questioned. Here, we estimated the fragility of phase III oncology trials.
Using Kaplan-Meier curves for the primary endpoints of 230 two-arm superiority phase III oncology trials, we reconstructed data for individual patients. We estimated the survival-inferred fragility index (SIFI) by iteratively flipping the best responder from the experimental arm to the control arm (SIFI) until the interpretation was changed according to the significance threshold of each trial. Severe fragility was defined by SIFI ≤1%.
This study included 230 trials enrolling 184,752 patients. The median number of patients required to change trial interpretation was 8 (interquartile range, 4 to 19) or 1.4% (interquartile range, 0.7% to 3%) per SIFI. Estimations of SIFI by multiple methods were largely consistent. For trials with an overall survival primary endpoint, the median SIFI was 1% (IQR, 0.5% to 1.9%). Severe fragility was found in 87 trials (38%). As a continuous statistic, the original value-but not its binary significance interpretation-was associated with fragility and severe fragility. Trials with subsequent FDA approval had lower odds of severe fragility. Lastly, the underlying survival model had differential effects on SIFI estimation.
Even among phase III oncology trials, which directly inform patient care, changes in the outcomes of few patients are often sufficient to change statistical significance and trial interpretation. These findings imply that current definitions of statistical significance used in phase III oncology are inadequate to identify replicable findings.
目前,统计学显著性用于定义III期肿瘤试验中的优效性。然而,这种做法越来越受到质疑。在此,我们评估了III期肿瘤试验的脆弱性。
利用230项双臂优效性III期肿瘤试验主要终点的Kaplan-Meier曲线,我们重建了个体患者的数据。通过将试验组中最佳反应者反复翻转至对照组来估计生存推断脆弱性指数(SIFI),直到根据各试验的显著性阈值改变解读。严重脆弱性定义为SIFI≤1%。
本研究纳入了230项试验,涉及184,752例患者。改变试验解读所需的患者中位数为8例(四分位间距,4至19例)或每SIFI为1.4%(四分位间距,0.7%至3%)。多种方法对SIFI的估计基本一致。对于总生存为主要终点的试验,SIFI中位数为1%(IQR,0.5%至1.9%)。87项试验(38%)存在严重脆弱性。作为连续统计量,原始 值(而非其二元显著性解读)与脆弱性和严重脆弱性相关。随后获得FDA批准的试验出现严重脆弱性的几率较低。最后,潜在生存模型对SIFI估计有不同影响。
即使在直接为患者治疗提供依据的III期肿瘤试验中,少数患者结局的改变往往足以改变统计学显著性和试验解读。这些发现表明,III期肿瘤试验中目前使用的统计学显著性定义不足以识别可重复的结果。