Guo Qiang, Li Junyun, Wei Zhe, Xu Jingjing, Duan Shaojun, Li Jianfeng, Liu Yaxi
Team of Clinical Pharmacy, Department of Pharmacy, Jincheng People's Hospital, Jincheng City, People's Republic of China.
Department of General Surgery, Jincheng People's Hospital, Jincheng City, People's Republic of China.
BMC Cancer. 2025 Jul 1;25(1):1047. doi: 10.1186/s12885-025-14329-z.
BACKGROUND: Given the limitations of traditional imaging examinations to detect distant metastasis (DM) (e.g., low sensitivity), this study is to identify pathological and laboratory risk factors and establish models predicting distant metastasis of colon adenocarcinoma (CA) patients. METHODS: CA Patients diagnosed between the year of 2018 and 2021 were retrieved from SEER. Logistic regression was utilized to find independent risk factors (IRFs) of DM and 12 models including BNB (Bernoulli naïve bayes), DT (Decision tree), GBC (Gradient Boosting Classifier), GNB (Gaussian naïve bayes), KNN (K-nearest neighbor), LDA (Linear Discriminant Analysis), LR (Logistic regression), MLP (Multi-layer perceptron classifier), MNB (Multinomial naïve bayes), QDA (Quadratic discriminant analysis), RFC (Random forest classifier) and SVC (Support vector machine) were established and evaluated on the training set and test set (7:3) of the retrieved patients. Additionally, CA patient data was collected from Jincheng People’s Hospital (JCPH) as an external validation set for the prediction efficacy of the models. RESULTS: 7,000 and 83 CA patients were retrieved from SEER and JCPH respectively, and 8 IRFs including age 60–79 (OR = 0.589, 95% CI: 0.391–0.887) and age > 80 (OR = 0.456, 95% CI: 0.287–0.722), primary site – cecum (OR = 1.305, 95% CI: 1.023–1.664), TNM stage – T3 (OR = 8.869, 95% CI: 2.151–36.569) and T4 (OR = 15.912, 95% CI: 3.839–65.955), TNM stage – N1 (OR = 3.853, 95% CI: 2.919–5.087) and N2 (OR = 8.480, 95% CI: 6.322–11.374), number of regional nodes examined > 12 (OR = 0.439, 95% CI: 0.326–0.591), tumor deposits (OR = 1.989, 95% CI: 1.639–2.414), carcinoembryonic antigen (CEA) level (OR = 4.552, 95% CI: 3.747–5.530) and perineural invasion (OR = 1.352, 95% CI: 1.112–1.643) were identified. LR showed the best predictive efficacy both on the test (AUC = 0.892, sensitivity = 0.825, specificity = 0.801) and external validation set (AUC = 0.868, sensitivity = 1.000, specificity = 0.727). CONCLUSIONS: Machine learning is a promising way to assist the detection of DM for CA patients.
背景:鉴于传统影像学检查在检测远处转移(DM)方面存在局限性(如敏感性低),本研究旨在确定病理和实验室风险因素,并建立预测结肠腺癌(CA)患者远处转移的模型。 方法:从监测、流行病学与最终结果(SEER)数据库中检索2018年至2021年期间诊断的CA患者。采用逻辑回归分析寻找DM的独立危险因素(IRF),并建立包括伯努利朴素贝叶斯(BNB)、决策树(DT)、梯度提升分类器(GBC)、高斯朴素贝叶斯(GNB)、K近邻(KNN)、线性判别分析(LDA)、逻辑回归(LR)、多层感知器分类器(MLP)、多项式朴素贝叶斯(MNB)、二次判别分析(QDA)、随机森林分类器(RFC)和支持向量机(SVC)在内的12种模型,并在检索患者的训练集和测试集(7:3)上进行评估。此外,收集来自晋城人民医院(JCPH)的CA患者数据作为模型预测效能的外部验证集。 结果:分别从SEER和JCPH数据库中检索到7000例和83例CA患者,确定了8个IRF,包括年龄60 - 79岁(OR = 0.589,95%CI:0.391 - 0.887)和年龄>80岁(OR = 0.456,95%CI:0.287 - 0.722)、原发部位 - 盲肠(OR = 1.305,95%CI:1.023 - 1.664)、TNM分期 - T3(OR = 8.869,95%CI:2.151 - 36.569)和T4(OR = 15.912,95%CI:3.839 - 65.955)、TNM分期 - N1(OR = 3.85环,95%CI:2.919 - 5.087)和N2(OR = 8.480,95%CI:6.322 - 11.374)、检查的区域淋巴结数量>12个(OR = 0.439,95%CI:0.326 - 0.591)、肿瘤沉积(OR = 1.989,95%CI:1.639 - 2.414)、癌胚抗原(CEA)水平(OR = 4.552,95%CI:3.747 - 5.530)和神经周围侵犯(OR = 1.352,95%CI:1.112 - 1.643)。LR在测试集(AUC = 0.892,敏感性 = 0.825,特异性 = 0.801)和外部验证集(AUC = 0.868,敏感性 = 1.000,特异性 = 0.727)上均显示出最佳预测效能。 结论:机器学习是辅助检测CA患者DM的一种有前景的方法。
JNCI Cancer Spectr. 2024-9-2
CA Cancer J Clin. 2024
J Clin Oncol. 2022-3-10