Bao Li, Wang Yu-Tong, Zhuang Jun-Ling, Liu Ai-Jun, Dong Yu-Jun, Chu Bin, Chen Xiao-Huan, Lu Min-Qiu, Shi Lei, Gao Shan, Fang Li-Juan, Xiang Qiu-Qing, Ding Yue-Hua
Department of Hematology, Beijing Jishuitan Hospital, 4th Clinical Medical College of Peking University, Beijing, China.
Department of Hematology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, China.
Front Oncol. 2022 Jun 30;12:922039. doi: 10.3389/fonc.2022.922039. eCollection 2022.
To use machine learning methods to explore overall survival (OS)-related prognostic factors in elderly multiple myeloma (MM) patients.
Data were cleaned and imputed using simple imputation methods. Two data resampling methods were implemented to facilitate model building and cross validation. Four algorithms including the cox proportional hazards model (CPH); DeepSurv; DeepHit; and the random survival forest (RSF) were applied to incorporate 30 parameters, such as baseline data, genetic abnormalities and treatment options, to construct a prognostic model for OS prediction in 338 elderly MM patients (>65 years old) from four hospitals in Beijing. The C-index and the integrated Brier score (IBwere used to evaluate model performances.
The 30 variables incorporated in the models comprised MM baseline data, induction treatment data and maintenance therapy data. The variable importance test showed that the OS predictions were largely affected by the maintenance schema variable. Visualizing the survival curves by maintenance schema, we realized that the immunomodulator group had the best survival rate. C-indexes of 0.769, 0.780, 0.785, 0.798 and IBS score of 0.142, 0.112, 0.108, 0.099 were obtained from the CPH model, DeepSurv, DeepHit, and the RSF model respectively. The RSF model yield best scores from the fivefold cross-validation, and the results showed that different data resampling methods did affect our model results.
We established an OS model for elderly MM patients without genomic data based on 30 characteristics and treatment data by machine learning.
运用机器学习方法探究老年多发性骨髓瘤(MM)患者的总生存期(OS)相关预后因素。
使用简单插补方法对数据进行清理和插补。实施两种数据重采样方法以促进模型构建和交叉验证。应用包括Cox比例风险模型(CPH)、DeepSurv、DeepHit和随机生存森林(RSF)在内的四种算法,纳入30个参数,如基线数据、基因异常和治疗方案,为北京四家医院的338例老年MM患者(>65岁)构建OS预测的预后模型。使用C指数和综合Brier评分(IB)评估模型性能。
模型纳入的30个变量包括MM基线数据、诱导治疗数据和维持治疗数据。变量重要性测试表明,OS预测在很大程度上受维持方案变量的影响。通过维持方案可视化生存曲线,我们发现免疫调节剂组的生存率最佳。CPH模型、DeepSurv、DeepHit和RSF模型分别获得的C指数为0.769、0.780、0.785、0.798,IBS评分为0.142、0.112、0.108、0.099。RSF模型在五折交叉验证中得分最佳,结果表明不同的数据重采样方法确实会影响我们的模型结果。
我们通过机器学习,基于30个特征和治疗数据,为无基因组数据的老年MM患者建立了一个OS模型。