Chen Qing-Yu, Yin Shu-Min, Shao Ming-Ming, Yi Feng-Shuang, Shi Huan-Zhong
Department of Respiratory and Critical Care Medicine, Beijing Institute of Respiratory Medicine and Beijing Chao-Yang Hospital, Capital Medical University, Beijing, 100020, China.
Medical Research Center, Beijing Institute of Respiratory Medicine and Beijing Chao-Yang Hospital, Capital Medical University, Beijing, 100020, China.
Respir Res. 2025 May 2;26(1):170. doi: 10.1186/s12931-025-03253-2.
Classification of the etiologies of pleural effusion is a critical challenge in clinical practice. Traditional diagnostic methods rely on a simple cut-off method based on the laboratory tests. However, machine learning (ML) offers a novel approach based on artificial intelligence to improving diagnostic accuracy and capture the non-linear relationships.
A retrospective study was conducted using data from patients diagnosed with pleural effusion. The dataset was divided into training and test set with a ratio of 7:3 with 6 machine learning algorithms implemented to diagnosis pleural effusion. Model performances were assessed by accuracy, precision, recall, F1 scores and area under the receiver operating characteristic curve (AUC). Feature importance and average prediction of age, Adenosine (ADA) and Lactate dehydrogenase (LDH) was analyzed. Decision tree was visualized.
A total of 742 patients were included (training cohort: 522, test cohort: 220), 397 (53.3%) diagnosed with malignant pleural effusion (MPE) and 253 (34.1%) with tuberculous pleural effusion (TPE) in the cohort. All of the 6 models performed well in the diagnosis of MPE, TPE and transudates. Extreme Gradient Boosting and Random Forest performed better in the diagnosis of the MPE, with F1 scores above 0.890, while K-Nearest Neighbors and Tabular Transformer performed better in the diagnosis of the TPE, with F1 scores above 0.870. ADA was identified as the most important feature. The ROC of machine learning model outperformed those of conventional diagnostic thresholds.
This study demonstrates that ML models using age, ADA, and LDH can effectively classify the etiologies of pleural effusion, suggesting that ML-based approaches may enhance diagnostic decision-making.
胸腔积液病因的分类是临床实践中的一项关键挑战。传统诊断方法依赖基于实验室检查的简单截断方法。然而,机器学习提供了一种基于人工智能的新方法,可提高诊断准确性并捕捉非线性关系。
采用诊断为胸腔积液患者的数据进行回顾性研究。数据集按7:3的比例分为训练集和测试集,实施6种机器学习算法用于诊断胸腔积液。通过准确性、精确性、召回率、F1分数和受试者工作特征曲线下面积(AUC)评估模型性能。分析了年龄、腺苷(ADA)和乳酸脱氢酶(LDH)的特征重要性和平均预测值。对决策树进行了可视化处理。
共纳入742例患者(训练队列:522例,测试队列:220例),该队列中397例(53.3%)诊断为恶性胸腔积液(MPE),253例(34.1%)诊断为结核性胸腔积液(TPE)。所有6种模型在MPE、TPE和漏出液的诊断中均表现良好。极端梯度提升和随机森林在MPE诊断中表现更好,F1分数高于0.890,而K近邻和表格变换器在TPE诊断中表现更好,F1分数高于0.870。ADA被确定为最重要的特征。机器学习模型的ROC优于传统诊断阈值。
本研究表明,使用年龄、ADA和LDH的机器学习模型可有效分类胸腔积液的病因,提示基于机器学习的方法可能会增强诊断决策。