Department of Biostatistics, School of Public Health, Peking University, Beijing, China.
Department of Respiration and Critical Care Medicine, Beijing Chaoyang Hospital, Beijing, China.
Int J Biol Markers. 2023 Jun;38(2):139-146. doi: 10.1177/03936155231158125. Epub 2023 Feb 27.
To evaluate the diagnostic value of combinations of tumor markers carcinoembryonic antigen (CEA), carbohydrate antigen (CA) 125, CA153, and CA19-9 in identifying malignant pleural effusion (MPE) from non-malignant pleural effusion (non-MPE) using machine learning, and compare the performance of popular machine learning methods.
A total of 319 samples were collected from patients with pleural effusion in Beijing and Wuhan, China, from January 2018 to June 2020. Five machine learning methods including Logistic regression, extreme gradient boosting (XGBoost), Bayesian additive regression tree, random forest, and support vector machine were applied to evaluate the diagnostic performance. Sensitivity, specificity, Youden's index, and the area under the receiver operating characteristic curve (AUC) were used to evaluate the performance of different diagnostic models.
For diagnostic models with a single tumor marker, the model using CEA, constructed by XGBoost, performed best (AUC = 0.895, sensitivity = 0.80), and the model with CA153, also by XGBoost, showed the largest specificity 0.98. Among all combinations of tumor markers, the combination of CEA and CA153 achieved the best performance (AUC = 0.921, sensitivity = 0.85) in identifying MPE under the diagnostic model constructed by XGBoost.
Diagnostic models for MPE with a combination of multiple tumor markers outperformed the models with a single tumor marker, particularly in sensitivity. Using machine learning methods, especially XGBoost, could comprehensively improve the diagnostic accuracy of MPE.
为了利用机器学习评估癌胚抗原(CEA)、糖类抗原(CA)125、CA153 和 CA19-9 等肿瘤标志物联合用于鉴别恶性胸腔积液(MPE)与非恶性胸腔积液(non-MPE)的诊断价值,并比较几种常用机器学习方法的性能。
收集了 2018 年 1 月至 2020 年 6 月中国北京和武汉胸腔积液患者的 319 份样本。应用逻辑回归、极端梯度提升(XGBoost)、贝叶斯加法回归树、随机森林和支持向量机等 5 种机器学习方法来评估诊断性能。采用敏感度、特异度、约登指数和受试者工作特征曲线下面积(AUC)来评价不同诊断模型的性能。
对于单个肿瘤标志物的诊断模型,XGBoost 构建的 CEA 模型表现最佳(AUC=0.895,敏感度=0.80),XGBoost 构建的 CA153 模型特异性最高(0.98)。在所有肿瘤标志物组合中,XGBoost 构建的 CEA 和 CA153 组合模型在鉴别 MPE 方面性能最佳(AUC=0.921,敏感度=0.85)。
多肿瘤标志物联合诊断模型优于单个肿瘤标志物模型,尤其在敏感度方面。利用机器学习方法,特别是 XGBoost,可全面提高 MPE 的诊断准确性。