Kızılaslan Fatih, Michael Swanson David, Vitelli Valeria
Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Norway.
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Stat Methods Med Res. 2025 Jun;34(6):1192-1218. doi: 10.1177/09622802251327687. Epub 2025 Mar 31.
A novel mixture cure frailty model is introduced for handling censored survival data. Mixture cure models are preferable when the existence of a cured fraction among patients can be assumed. However, such models are heavily underexplored: frailty structures within cure models remain largely undeveloped, and furthermore, most existing methods do not work for high-dimensional datasets, when the number of predictors is significantly larger than the number of observations. In this study, we introduce a novel extension of the Weibull mixture cure model that incorporates a frailty component, employed to model an underlying latent population heterogeneity with respect to the outcome risk. Additionally, high-dimensional covariates are integrated into both the cure rate and survival part of the model, providing a comprehensive approach to employ the model in the context of high-dimensional omics data. We also perform variable selection via an adaptive elastic-net penalization, and propose a novel approach to inference using the expectation-maximization (EM) algorithm. Extensive simulation studies are conducted across various scenarios to demonstrate the performance of the model, and results indicate that our proposed method outperforms competitor models. We apply the novel approach to analyze RNAseq gene expression data from bulk breast cancer patients included in The Cancer Genome Atlas (TCGA) database. A set of prognostic biomarkers is then derived from selected genes, and subsequently validated via both functional enrichment analysis and comparison to the existing biological literature. Finally, a prognostic risk score index based on the identified biomarkers is proposed and validated by exploring the patients' survival.
我们引入了一种新型混合治愈脆弱模型来处理删失生存数据。当假定患者中存在治愈比例时,混合治愈模型是更可取的。然而,此类模型的研究严重不足:治愈模型中的脆弱结构在很大程度上仍未得到充分发展,此外,当预测变量的数量显著大于观测值的数量时,大多数现有方法不适用于高维数据集。在本研究中,我们引入了威布尔混合治愈模型的一种新型扩展,该扩展纳入了一个脆弱成分,用于对潜在的潜在总体异质性进行建模,以评估结局风险。此外,高维协变量被整合到模型的治愈率和生存部分,提供了一种在高维组学数据背景下应用该模型的综合方法。我们还通过自适应弹性网惩罚进行变量选择,并提出了一种使用期望最大化(EM)算法进行推断的新方法。我们在各种场景下进行了广泛的模拟研究,以证明该模型的性能,结果表明我们提出的方法优于竞争模型。我们应用这种新方法分析了来自癌症基因组图谱(TCGA)数据库中大量乳腺癌患者的RNAseq基因表达数据。然后从选定的基因中得出一组预后生物标志物,并通过功能富集分析和与现有生物学文献的比较进行验证。最后,基于已识别的生物标志物提出了一个预后风险评分指数,并通过探索患者的生存情况进行验证。