Moon Hojin, Nguyen Phan N, Park Jaehee, Lee Minho, Ahn Sohyul
Department of Mathematics and Statistics, California State University, Long Beach 1250 Bellflower Blvd., Long Beach, CA 90840, USA.
Portola High School, Irvine, CA 92618, USA.
J Pers Med. 2025 May 27;15(6):218. doi: 10.3390/jpm15060218.
: Adjuvant chemotherapy (ACT) can improve survival outcomes for patients with early-stage non-small cell lung cancer (NSCLC), but its benefit varies significantly across individuals. Identifying patients who are likely to benefit from ACT remains a critical challenge in precision oncology. : We constructed a meta-database from two publicly available NSCLC gene expression datasets (GSE37745 and GSE29013) to address population heterogeneity. Feature selection was performed using Cox-based univariate screening with leave-one-out cross-validation. We then developed and compared three survival modeling frameworks: bagging with elastic net penalized Cox regression, Random Survival Forests (RSF), and DeepSurv neural survival networks. All models incorporated clinical covariates and selected genomic features to predict survival and recommend ACT versus observation (OBS). : Across 155 patients, RSF achieved the highest predictive performance, with a test concordance index (C-index) of0.885. Model-based recommendations were associated with improved survival in both training and test datasets, as confirmed by Kaplan-Meier analysis. Key genomic features identified included TTR, MTURN, and ETV3, suggesting their potential relevance in treatment response stratification. DeepSurv demonstrated strong predictive accuracy (C-index = 0.982) but less distinct survival curve separation compared to RSF. : Our findings demonstrate that machine learning-driven survival models, particularly RSF, can effectively identify NSCLC patients who may benefit from ACT. This approach supports data-driven, individualized chemotherapy decision-making and contributes to advancing personalized treatment strategies in early-stage NSCLC.
辅助化疗(ACT)可改善早期非小细胞肺癌(NSCLC)患者的生存结局,但其益处因个体差异显著。识别可能从ACT中获益的患者仍然是精准肿瘤学中的一项关键挑战。
我们从两个公开可用的NSCLC基因表达数据集(GSE37745和GSE29013)构建了一个元数据库,以解决人群异质性问题。使用基于Cox的单变量筛选和留一法交叉验证进行特征选择。然后,我们开发并比较了三种生存建模框架:带弹性网络惩罚Cox回归的装袋法、随机生存森林(RSF)和深度生存神经生存网络。所有模型都纳入了临床协变量和选定的基因组特征,以预测生存情况并推荐ACT与观察(OBS)。
在155例患者中,RSF具有最高的预测性能,测试一致性指数(C指数)为0.885。基于模型的推荐与训练和测试数据集中生存率的提高相关,这一点通过Kaplan-Meier分析得到证实。确定的关键基因组特征包括TTR、MTURN和ETV3,表明它们在治疗反应分层中可能具有相关性。与RSF相比,深度生存模型表现出很强的预测准确性(C指数 = 0.982),但生存曲线分离度较小。
我们的研究结果表明,机器学习驱动的生存模型,特别是RSF,可以有效地识别可能从ACT中获益的NSCLC患者。这种方法支持数据驱动的个体化化疗决策,并有助于推进早期NSCLC的个性化治疗策略。