Hao Y, Wu L N, Lyu Y T, Liu Y Z, Qin X S, Zheng R
Department of Laboratory Medicine, Shengjing Hospital of China Medical University, Shenyang 110000, China Liaoning Clinical Research Center for Laboratory Medicine, Shenyang 110000, China.
Biological Sciences, City University of Hong Kong, Hong Kong 999077, China.
Zhonghua Yu Fang Yi Xue Za Zhi. 2023 Nov 6;57(11):1827-1838. doi: 10.3760/cma.j.cn112150-20221111-01099.
Based on the diagnostic model established and validated by the machine learning algorithm, to investigate the value of seven tumor-associated autoantibodies (TAABs), namely anti-p53, PGP9.5, SOX2, GAGE7, GBU4-5, MAGEA1 and CAGE antibodies in the diagnosis of non-small cell lung cancer (NSCLC) and to differentiate between NSCLC and benign lung nodules. This was a retrospective study of clinical cases. Model building queue: a total of 227 primary patients who underwent radical lung cancer surgery in the Department of Thoracic Surgery, Shengjing Hospital of China Medical University, from November 2018 to June 2021 were collected as the NSCLC group, and 120 cases of benign lung nodules, 122 cases of pneumonia and 120 healthy individuals were selected as the control groups. External validation queue: a total of 100 primary patients who underwent radical lung cancer surgery in the Department of Thoracic Surgery, Shengjing Hospital of China Medical University, from May 2022 to December 2022 were collected as the NSCLC group, and 36 cases of benign lung nodules, 32 cases of pneumonia and 44 healthy individuals were selected as the control groups. In addition, NSCLC was divided into early (stage 0-ⅠB) and mid-to-late (stage ⅡA-ⅢB) subgroups. The levels of 7-TAABs were detected by enzyme immunoassay, and serum concentrations of CEA and CYFRA21-1 were detected by electrochemiluminescence. Four machine learning algorithms, XGBoost, Lasso logistic regression, Naïve Bayes, and Support Vector Machine are used to establish classification models. And the best performance model was chosen based on evaluation metrics and a multi-indicator combination model was established. In addition, an online risk evaluation tool was generated to assist clinical applications. Except for p53, the levels of rest six TAABs, CEA and CYFRA21-1 were significantly higher in the NSCLC group (<0.05). Serum levels of anti-SOX2 [1.50 (0.60, 10.85) U/ml 0.8 (0.20, 2.10) U/ml, =2.630, <0.05] and MAGEA1 antibodies [0.20 (0.10, 0.43) U/ml 0.10 (0.10, 0.20) U/ml, =2.289, <0.05], CEA [3.13 (2.12, 5.64) ng/ml 2.11 (1.25, 3.09) ng/ml, =3.970, <0.05] and CYFRA21-1 [4.31(2.37, 7.14) ng/ml 2.53(1.92, 3.48) ng/ml, =3.959, <0.05] were significantly higher in patients with mid-to late-stage NSCLC than in early stages. XGBoost model was used to establish a multi-indicator combined detection model (after removing p53). 6-TAABs combined with CYFRA21-1 was the best combination model for the diagnosis of NSCLC and early NSCLC. The optimal diagnostic thresholds were 0.410, 0.701 and 0.744, and the AUC was 0.828, 0.757 and 0.741, respectively (NSCLC control, NSCLC benign lung nodules, early NSCLC benign lung nodules) in model building queue, and the AUC was 0.760, 0.710 and 0.660, respectively (NSCLC control, NSCLC benign lung nodules, early NSCLC benign lung nodules) in external validation queue. In the diagnosis of NSCLC, 6-TAABs is superior to that of traditional tumor markers CEA and CYFRA21-1, and can compensate for the shortcomings of traditional tumor markers. For the differential diagnosis of NSCLC and benign lung nodule, "6-TAABs+CYFRA21-1" is the most cost-effective combination, and plays an important role in prevention and screening for early lung cancer.
基于机器学习算法建立并验证的诊断模型,研究七种肿瘤相关自身抗体(TAABs),即抗p53、PGP9.5、SOX2、GAGE7、GBU4-5、MAGEA1和CAGE抗体在非小细胞肺癌(NSCLC)诊断中的价值,并区分NSCLC与良性肺结节。这是一项临床病例回顾性研究。模型构建队列:收集2018年11月至2021年6月在中国医科大学附属盛京医院胸外科接受肺癌根治术的227例原发性患者作为NSCLC组,并选取120例良性肺结节、122例肺炎患者及120例健康个体作为对照组。外部验证队列:收集2022年5月至2022年12月在中国医科大学附属盛京医院胸外科接受肺癌根治术的100例原发性患者作为NSCLC组,并选取36例良性肺结节、32例肺炎患者及44例健康个体作为对照组。此外,将NSCLC分为早期(0-ⅠB期)和中晚期(ⅡA-ⅢB期)亚组。采用酶免疫法检测7种TAABs水平,采用电化学发光法检测血清CEA和CYFRA21-1浓度。使用四种机器学习算法,即XGBoost、Lasso逻辑回归、朴素贝叶斯和支持向量机建立分类模型。并根据评估指标选择最佳性能模型,建立多指标组合模型。此外,生成了一个在线风险评估工具以辅助临床应用。除p53外,NSCLC组其余六种TAABs、CEA和CYFRA21-1水平显著更高(<0.05)。中晚期NSCLC患者血清抗SOX2[1.50(0.60,10.85)U/ml比0.8(0.20,2.10)U/ml,Z=2.630,P<0.05]、MAGEA1抗体[0.20(0.10,0.43)U/ml比0.10(0.10,0.20)U/ml,Z=2.289,P<0.05]、CEA[3.13(2.12,5.64)ng/ml比2.11(1.25,3.09)ng/ml,Z=3.970,P<0.05]和CYFRA21-1[4.31(2.37,7.14)ng/ml比2.53(1.92,3.48)ng/ml,Z=3.959,P<0.05]水平显著高于早期患者。采用XGBoost模型建立多指标联合检测模型(去除p53后)。6种TAABs联合CYFRA21-1是诊断NSCLC及早期NSCLC的最佳组合模型。最佳诊断阈值分别为0.410、0.701和0.744,模型构建队列中的曲线下面积(AUC)分别为0.828、0.757和0.741(NSCLC与对照组、NSCLC与良性肺结节、早期NSCLC与良性肺结节),外部验证队列中的AUC分别为0.760、0.710和0.660(NSCLC与对照组、NSCLC与良性肺结节、早期NSCLC与良性肺结节)。在NSCLC诊断中,6种TAABs优于传统肿瘤标志物CEA和CYFRA21-1,可弥补传统肿瘤标志物的不足。对于NSCLC与良性肺结节的鉴别诊断,“6种TAABs+CYFRA21-1”是最具成本效益的组合,在早期肺癌的预防和筛查中发挥重要作用。