Zhang Cheng, Zhang Yi, Yang Ya-Hui, Xu Hui, Zhang Xiao-Peng, Wu Zhi-Jun, Xie Min-Min, Feng Ying, Feng Chong, Ma Tai
Department of Oncology, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China.
Anhui Provincial Cancer Institute/Anhui Provincial Office for Cancer Prevention and Control, Hefei, Anhui, China.
Front Mol Biosci. 2022 Dec 1;9:937242. doi: 10.3389/fmolb.2022.937242. eCollection 2022.
Tumor metastasis is a common event in patients with gastric cancer (GC) who previously underwent curative gastrectomy. It is meaningful to employ high-volume clinical data for predicting the survival of metastatic GC patients. We aim to establish an improved machine learning (ML) classifier for predicting if a patient with metastatic GC would die within 12 months. Eligible patients were enrolled from a Chinese GC cohort, and the complete detailed information from medical records was extracted to generate a high-dimensional dataset. Appropriate feature engineering and feature filter were conducted before modeling with eight algorithms. A 10-fold cross validation (CV) nested in a holdout CV (8:2) was employed for hyperparameter tuning and model evaluation. Model selection was based on the area under the receiver operating characteristic (AUROC) curve, recall, and precision. The selected model was globally explained using interpretable surrogate models. Of the total 399 cases (median survival of 8.2 months), 242 patients survived less than 12 months. The linear discriminant analysis (LDA), support vector machine (SVM), and random forest (RF) model had the highest AUROC (0.78 ± 0.021), recall (0.93 ± 0.031), and precision (0.80 ± 0.026), respectively. The LDA model created a new function that generally separated the two classes. The predicted probability of the SVM model was interpreted using a linear regression model visualized by a nomogram. The predicted class of the RF model was explained using a decision tree model. In summary, analyzing high-volume medical data by ML is helpful to produce an improved model for predicting the survival in patients with metastatic GC. The algorithm should be carefully selected in different practical scenarios.
肿瘤转移在先前接受过根治性胃切除术的胃癌(GC)患者中很常见。利用大量临床数据预测转移性GC患者的生存情况具有重要意义。我们旨在建立一种改进的机器学习(ML)分类器,以预测转移性GC患者是否会在12个月内死亡。符合条件的患者来自中国GC队列,从病历中提取完整详细信息以生成高维数据集。在使用八种算法进行建模之前,进行了适当的特征工程和特征筛选。采用嵌套在留出法交叉验证(8:2)中的10倍交叉验证(CV)进行超参数调整和模型评估。模型选择基于受试者工作特征(AUROC)曲线下面积、召回率和精确率。使用可解释的替代模型对所选模型进行全局解释。在总共399例病例(中位生存期为8.2个月)中,242例患者存活时间少于12个月。线性判别分析(LDA)、支持向量机(SVM)和随机森林(RF)模型的AUROC最高(分别为0.78±0.021)、召回率最高(分别为0.93±0.031)和精确率最高(分别为0.80±0.026)。LDA模型创建了一个新函数来大致区分这两类。使用通过列线图可视化的线性回归模型来解释SVM模型的预测概率。使用决策树模型来解释RF模型的预测类别。总之,通过机器学习分析大量医疗数据有助于生成一种改进模型,用于预测转移性GC患者的生存情况。在不同的实际场景中应谨慎选择算法。