Ternès Nils, Rotolo Federico, Michiels Stefan
Service de Biostatistique et d'Epidémiologie, Gustave Roussy, B2M, RdC.114 rue Edouard-Vaillant, 94805, Villejuif, France.
CESP, Fac. de médecine - Univ. Paris-Sud, Fac. de médecine - UVSQ, INSERM, Université Paris-Saclay, Villejuif, 94805, France.
BMC Med Res Methodol. 2017 May 22;17(1):83. doi: 10.1186/s12874-017-0354-0.
BACKGROUND: Thanks to the advances in genomics and targeted treatments, more and more prediction models based on biomarkers are being developed to predict potential benefit from treatments in a randomized clinical trial. Despite the methodological framework for the development and validation of prediction models in a high-dimensional setting is getting more and more established, no clear guidance exists yet on how to estimate expected survival probabilities in a penalized model with biomarker-by-treatment interactions. METHODS: Based on a parsimonious biomarker selection in a penalized high-dimensional Cox model (lasso or adaptive lasso), we propose a unified framework to: estimate internally the predictive accuracy metrics of the developed model (using double cross-validation); estimate the individual survival probabilities at a given timepoint; construct confidence intervals thereof (analytical or bootstrap); and visualize them graphically (pointwise or smoothed with spline). We compared these strategies through a simulation study covering scenarios with or without biomarker effects. We applied the strategies to a large randomized phase III clinical trial that evaluated the effect of adding trastuzumab to chemotherapy in 1574 early breast cancer patients, for which the expression of 462 genes was measured. RESULTS: In our simulations, penalized regression models using the adaptive lasso estimated the survival probability of new patients with low bias and standard error; bootstrapped confidence intervals had empirical coverage probability close to the nominal level across very different scenarios. The double cross-validation performed on the training data set closely mimicked the predictive accuracy of the selected models in external validation data. We also propose a useful visual representation of the expected survival probabilities using splines. In the breast cancer trial, the adaptive lasso penalty selected a prediction model with 4 clinical covariates, the main effects of 98 biomarkers and 24 biomarker-by-treatment interactions, but there was high variability of the expected survival probabilities, with very large confidence intervals. CONCLUSION: Based on our simulations, we propose a unified framework for: developing a prediction model with biomarker-by-treatment interactions in a high-dimensional setting and validating it in absence of external data; accurately estimating the expected survival probability of future patients with associated confidence intervals; and graphically visualizing the developed prediction model. All the methods are implemented in the R package biospear, publicly available on the CRAN.
背景:由于基因组学和靶向治疗的进展,越来越多基于生物标志物的预测模型被开发出来,用于在随机临床试验中预测治疗的潜在获益。尽管在高维环境中开发和验证预测模型的方法框架越来越完善,但对于如何在具有生物标志物与治疗相互作用的惩罚模型中估计预期生存概率,尚无明确的指导。 方法:基于在惩罚高维Cox模型(套索或自适应套索)中进行简约的生物标志物选择,我们提出了一个统一的框架来:在内部估计所开发模型的预测准确性指标(使用双重交叉验证);估计给定时间点的个体生存概率;构建其置信区间(解析法或自助法);并以图形方式直观显示它们(逐点或用样条平滑)。我们通过模拟研究比较了这些策略,涵盖有无生物标志物效应的情况。我们将这些策略应用于一项大型随机III期临床试验,该试验评估了在1574例早期乳腺癌患者中添加曲妥珠单抗至化疗的效果,其中测量了462个基因的表达。 结果:在我们的模拟中,使用自适应套索的惩罚回归模型以低偏差和标准误差估计新患者的生存概率;在非常不同的情况下,自助置信区间的经验覆盖概率接近名义水平。在训练数据集上进行的双重交叉验证紧密模拟了所选模型在外部验证数据中的预测准确性。我们还提出了一种使用样条对预期生存概率进行有用的直观表示。在乳腺癌试验中,自适应套索惩罚选择了一个包含4个临床协变量、98个生物标志物的主效应和24个生物标志物与治疗相互作用的预测模型,但预期生存概率的变异性很高,置信区间非常大。 结论:基于我们的模拟,我们提出了一个统一的框架,用于:在高维环境中开发具有生物标志物与治疗相互作用的预测模型并在没有外部数据的情况下进行验证;准确估计未来患者的预期生存概率及其相关置信区间;并以图形方式直观显示所开发的预测模型。所有方法都在R包biospear中实现,可在CRAN上公开获取。
BMC Bioinformatics. 2023-3-16
Bioinformatics. 2018-1-1
BMC Med Res Methodol. 2019-7-24
BMC Bioinformatics. 2020-7-2
Commun Stat Theory Methods. 2017
Biostatistics. 2016-10