Choi Seo Hee, Kim Euidam, Heo Seok-Jae, Seol Mi Youn, Chung Yoonsun, Yoon Hong In
Department of Radiation Oncology, Yonsei Cancer Center, Heavy Ion Therapy Research Institute, Yonsei University College of Medicine, Seoul, Republic of Korea.
Department of Nuclear Engineering, Hanyang University, Seoul, Republic of Korea.
Clin Transl Radiat Oncol. 2024 Jul 26;48:100819. doi: 10.1016/j.ctro.2024.100819. eCollection 2024 Sep.
We aimed to develop a machine learning-based prediction model for severe radiation pneumonitis (RP) by integrating relevant clinicopathological and genetic factors, considering the associations of clinical, dosimetric parameters, and single nucleotide polymorphisms (SNPs) of genes in the TGF-β1 pathway with RP.
We prospectively enrolled 59 primary lung cancer patients undergoing radiotherapy and analyzed pretreatment blood samples, clinicopathological/dosimetric variables, and 11 functional SNPs in TGFβ pathway genes. Using the Synthetic Minority Over-sampling Technique (SMOTE) and nested cross-validation, we developed a machine learning-based prediction model for severe RP (grade ≥ 2). Feature selection was conducted using four methods (filtered-based, wrapper-based, embedded, and logistic regression), and performance was evaluated using three machine learning models.
Severe RP occurred in 20.3 % of patients with a median follow-up of 39.7 months. In our final model, age (>66 years), smoking history, PTV volume (>300 cc), and AG/GG genotype in BMP2 rs1979855 were identified as the most significant predictors. Additionally, incorporating genomic variables for prediction alongside clinicopathological variables significantly improved the AUC compared to using clinicopathological variables alone (0.822 vs. 0.741, p = 0.029). The same feature set was selected using both the wrapper-based method and logistic model, demonstrating the best performance across all machine learning models (AUC: XGBoost 0.815, RF 0.805, SVM 0.712, respectively).
We successfully developed a machine learning-based prediction model for RP, demonstrating age, smoking history, PTV volume, and BMP2 rs1979855 genotype as significant predictors. Notably, incorporating SNP data significantly enhanced predictive performance compared to clinicopathological factors alone.
我们旨在通过整合相关临床病理和基因因素,考虑临床、剂量学参数以及转化生长因子-β1(TGF-β1)通路基因的单核苷酸多态性(SNP)与放射性肺炎(RP)的关联,开发一种基于机器学习的严重放射性肺炎预测模型。
我们前瞻性纳入了59例接受放疗的原发性肺癌患者,分析了治疗前血样、临床病理/剂量学变量以及TGFβ通路基因中的11个功能性SNP。使用合成少数过采样技术(SMOTE)和嵌套交叉验证,我们开发了一种基于机器学习的严重RP(≥2级)预测模型。使用四种方法(基于过滤、基于包装、嵌入式和逻辑回归)进行特征选择,并使用三种机器学习模型评估性能。
20.3%的患者发生了严重RP,中位随访时间为39.7个月。在我们的最终模型中,年龄(>66岁)、吸烟史、计划靶体积(>300 cc)以及骨形态发生蛋白2(BMP2)rs1979855中的AG/GG基因型被确定为最显著的预测因素。此外,与仅使用临床病理变量相比,将基因组变量纳入预测与临床病理变量一起显著提高了曲线下面积(AUC)(0.822对0.741,p = 0.029)。使用基于包装的方法和逻辑模型选择了相同的特征集,在所有机器学习模型中表现最佳(AUC:分别为XGBoost 0.815、随机森林(RF)0.805、支持向量机(SVM)0.712)。
我们成功开发了一种基于机器学习的RP预测模型,证明年龄、吸烟史、计划靶体积和BMP2 rs1979855基因型是显著的预测因素。值得注意的是,与单独的临床病理因素相比,纳入SNP数据显著提高了预测性能。