Garcia-Zamalloa Alberto, Arnay Rafael, Castilla-Rodriguez Iván, Mar Javier, Gonzalez-Cava Jose Manuel, Ibarrondo Oliver, Salegui Iñaki, De Miguel Juan Antonio, Mugica Nekane, Aguinagalde Borja, Zabaleta Jon, Basauri Begoña, Alonso Marta, Azcue Nekane, Gil Eva, Garmendia Irati, Taboada Jorge
Respiratory and Pleural Diseases Group. Biogipuzkoa Health Research Institute. Internal Medicine Service, Osakidetza/Basque Health Service, Mendaro Hospital, Spain.
Departamento de Ingeniería Informática y de Sistemas, Universidad de La Laguna, Santa Cruz de Tenerife, Spain.
PLoS One. 2025 Sep 5;20(9):e0329668. doi: 10.1371/journal.pone.0329668. eCollection 2025.
To perform an external validation of a previously reported machine learning (ML) approach for predicting the diagnosis of pleural tuberculosis.
We defined two cohorts: a Training group, comprising 273 out of 1,220 effusions from our prospective study (2013-2022); and a Testing group, from a retrospective analysis of 360 effusions from 832 consecutive patients in Bajo Deba health district (1996-2012). All the effusions included were exudative and lymphocytic. In Training and Testing groups respectively, 49 and 104 cases were tuberculous, 143 and 92 were malignant, and 81 and 164 were diagnosed with "other diseases"; pre-test probabilities of pleural tuberculosis were 4% and 12.7%. Variables included were: age, pH, adenosine deaminase, glucose, protein, and lactate dehydrogenase levels, and white cell counts (total and differential) in pleural fluid. We used two ML classifiers: binary (tuberculous and non-tuberculous), and three-class (tuberculous, malignant, and others); and compared them with Bayesian analysis.
The best binary classifier yielded a sensitivity of 88%, specificity of 98%, and accuracy of 95%. The best three-class classifier achieved the same accuracy and correctly classified 83% (77/92) of malignant cases. The ML models yielded higher positive predictive values than Bayesian analysis based on ADA > 40 U/l and lymphocyte percentage ≥ 50% (92%).
This external validation confirms the good performance of the previously reported ML approach for predicting the diagnosis of pleural tuberculosis based on exudative and lymphocytic pleural effusions, and for discriminating the cases most likely to be malignant. Additionally, ML was more accurate than the Bayesian approach in our study.
对先前报道的用于预测胸膜结核诊断的机器学习(ML)方法进行外部验证。
我们定义了两个队列:一个训练组,由我们前瞻性研究(2013 - 2022年)中1220例胸腔积液中的273例组成;一个测试组,来自对巴霍德巴卫生区832例连续患者的360例胸腔积液的回顾性分析(1996 - 2012年)。纳入的所有胸腔积液均为渗出性且淋巴细胞性。训练组和测试组中,结核性病例分别为49例和104例,恶性病例分别为143例和92例,“其他疾病”诊断病例分别为81例和164例;胸膜结核的预测试概率分别为4%和12.7%。纳入的变量包括:年龄、pH值、腺苷脱氨酶、葡萄糖、蛋白质、乳酸脱氢酶水平以及胸腔积液中的白细胞计数(总数和分类计数)。我们使用了两种ML分类器:二元分类器(结核性和非结核性)和三类分类器(结核性、恶性和其他);并将它们与贝叶斯分析进行比较。
最佳二元分类器的灵敏度为88%,特异度为98%,准确度为95%。最佳三类分类器达到了相同的准确度,并正确分类了83%(77/92)的恶性病例。ML模型产生的阳性预测值高于基于ADA > 40 U/l和淋巴细胞百分比≥50%(92%)的贝叶斯分析。
此次外部验证证实了先前报道的基于渗出性和淋巴细胞性胸腔积液预测胸膜结核诊断以及鉴别最可能为恶性病例的ML方法具有良好性能。此外,在我们的研究中,ML比贝叶斯方法更准确。