Colorectal Research Center, Iran University of Medical Sciences, Tehran, Iran.
British Heart Foundation Cardiovascular Research Centre, University of Glasgow, Glasgow, UK.
Sci Rep. 2023 Mar 13;13(1):4163. doi: 10.1038/s41598-023-31272-w.
Gastric cancer (GC), with a 5-year survival rate of less than 40%, is known as the fourth principal reason of cancer-related mortality over the world. This study aims to develop predictive models using different machine learning (ML) classifiers based on both demographic and clinical variables to predict metastasis status of patients with GC. The data applied in this study including 733 of GC patients, divided into a train and test groups at a ratio of 8:2, diagnosed at Taleghani tertiary hospital. In order to predict metastasis in GC, ML-based algorithms, including Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN), Decision Tree (RT) and Logistic Regression (LR), with 5-fold cross validation were performed. To assess the model performance, F1 score, precision, sensitivity, specificity, area under the curve (AUC) of receiver operating characteristic (ROC) curve and precision-recall AUC (PR-AUC) were obtained. 262 (36%) experienced metastasis among 733 patients with GC. Although all models have optimal performance, the indices of SVM model seems to be more appropiate (training set: AUC: 0.94, Sensitivity: 0.94; testing set: AUC: 0.85, Sensitivity: 0.92). Then, NN has the higher AUC among ML approaches (training set: AUC: 0.98; testing set: AUC: 0.86). The RF of ML-based models, which determine size of tumor and age as two essential variables, is considered as the third efficient model, because of higher specificity and AUC (84% and 87%). Based on the demographic and clinical characteristics, ML approaches can predict the metastasis status in GC patients. According to AUC, sensitivity and specificity in both SVM and NN can be regarded as better algorithms among 6 applied ML-based methods.
胃癌(GC)的 5 年生存率低于 40%,是全球癌症相关死亡的第四大主要原因。本研究旨在基于人口统计学和临床变量开发不同的机器学习(ML)分类器预测模型,以预测 GC 患者的转移状态。本研究应用的数据包括 733 名 GC 患者,按 8:2 的比例分为训练组和测试组,在塔莱加尼三级医院确诊。为了预测 GC 中的转移,采用基于 ML 的算法,包括朴素贝叶斯(NB)、随机森林(RF)、支持向量机(SVM)、神经网络(NN)、决策树(RT)和逻辑回归(LR),并进行了 5 折交叉验证。为了评估模型性能,获得了 F1 评分、精度、敏感性、特异性、接收者操作特征(ROC)曲线下面积(AUC)和精度-召回 AUC(PR-AUC)。在 733 名 GC 患者中,有 262 名(36%)发生了转移。尽管所有模型都具有最佳性能,但 SVM 模型的指标似乎更为合适(训练集:AUC:0.94,敏感性:0.94;测试集:AUC:0.85,敏感性:0.92)。然后,NN 在 ML 方法中具有更高的 AUC(训练集:AUC:0.98;测试集:AUC:0.86)。基于 ML 的模型的 RF 确定肿瘤大小和年龄为两个重要变量,被认为是第三有效模型,因为其特异性和 AUC 更高(84%和 87%)。基于人口统计学和临床特征,ML 方法可以预测 GC 患者的转移状态。根据 AUC、敏感性和特异性,SVM 和 NN 可以被视为 6 种应用的 ML 方法中更好的算法。