Suppr超能文献

使用LightGBM和SHAP对结核性胸腔积液进行可解释的无创诊断:机器学习模型的开发与临床应用

Interpretable noninvasive diagnosis of tuberculous pleural effusion using LGBM and SHAP: development and clinical application of a machine learning model.

作者信息

Yao Bihua, Yu Xingyu, Qiu Liannv, Gu Er-Min, Mao Siyu, Jiang Lei, Tong Jijun, Wu Jianguo

机构信息

Laboratory Medicine Center, Department of Clinical Laboratory, The First People's Hospital of Jiashan Affiliated to Jiaxing University, Jiaxing, Zhejiang, China.

Zhejiang Sci-Tech University, Hangzhou, Zhejiang, China.

出版信息

PeerJ. 2025 May 20;13:e19411. doi: 10.7717/peerj.19411. eCollection 2025.

Abstract

BACKGROUND

Tuberculous pleural effusion (TPE) is a prevalent tuberculosis complication, with diagnosis presenting considerable challenges. Timely and precise identification of TPE is vital for effective patient management and prognosis, yet existing diagnostic methods tend to be invasive, lengthy, and often lack sufficient accuracy. This study seeks to design and validate an interpretable machine learning model based on routine laboratory data to enable noninvasive and rapid TPE diagnosis.

METHODS

A multicenter prospective study was conducted across China between January 2021 and September 2024, enrolling 963 patients. The derivation cohort, comprising 763 patients, was employed for model training and internal validation, while 200 patients formed the external validation cohort. The model was built upon 18 routine laboratory parameters, including pleural fluid and serum biomarkers, with multiple machine learning (ML) algorithms evaluated. Light gradient boosting machine (LGBM) emerged as the top-performing model. Shapley Additive exPlanations (SHAP) analysis assessed feature importance and interpretability. Model performance was evaluated area under the curve (AUC) and accuracy metrics.

RESULTS

Of the 10 ML models compared, LGBM demonstrated superior performance. Feature importance analysis identified 11 key variables, leading to constructing a highly interpretable LGBM model. The model achieved an AUC of 0.9454 in internal validation and 0.9262 in external validation, showcasing strong robustness and generalizability. SHAP analysis enhanced interpretability by highlighting each feature's contribution to prediction outcomes. This model has since been integrated into clinical practice for noninvasive, rapid TPE diagnosis. During external validation, the model achieved a sensitivity of 0.8600, specificity of 0.9056, positive predictive value of 0.8698, and negative predictive value of 0.8686, underscoring its accuracy across diverse patient cohorts.

INTERPRETATION

This interpretable machine learning model offers a noninvasive, accurate solution for early TPE diagnosis, significantly reducing reliance on invasive procedures. The integration of SHAP ensures the model's clinical interpretability, mitigating concerns surrounding the "black-box" nature of many machine learning approaches.

CONCLUSIONS

This interpretable LGBM-based model provides a reliable, noninvasive tool for TPE diagnosis. It supports clinical decision-making with real-time risk assessment and promises broader applicability through future integration into clinical information systems.

摘要

背景

结核性胸腔积液(TPE)是一种常见的结核病并发症,其诊断面临诸多挑战。及时、准确地识别TPE对于有效的患者管理和预后至关重要,但现有的诊断方法往往具有侵入性、耗时且准确性不足。本研究旨在基于常规实验室数据设计并验证一种可解释的机器学习模型,以实现TPE的无创快速诊断。

方法

2021年1月至2024年9月在中国开展了一项多中心前瞻性研究,纳入963例患者。将包含763例患者的推导队列用于模型训练和内部验证,200例患者组成外部验证队列。该模型基于18项常规实验室参数构建,包括胸水和血清生物标志物,并对多种机器学习(ML)算法进行了评估。轻梯度提升机(LGBM)成为表现最佳的模型。采用夏普利值(SHAP)分析评估特征重要性和可解释性。通过曲线下面积(AUC)和准确性指标评估模型性能。

结果

在比较的10个ML模型中,LGBM表现出卓越性能。特征重要性分析确定了11个关键变量,据此构建了一个高度可解释的LGBM模型。该模型在内部验证中的AUC为0.9454,在外部验证中的AUC为0.9262,显示出强大的稳健性和泛化能力。SHAP分析通过突出每个特征对预测结果的贡献增强了可解释性。该模型已被整合到临床实践中用于无创快速TPE诊断。在外部验证期间,该模型的灵敏度为0.8600,特异度为0.9056,阳性预测值为0.8698,阴性预测值为0.8686,凸显了其在不同患者队列中的准确性。

解读

这种可解释的机器学习模型为早期TPE诊断提供了一种无创、准确的解决方案,显著减少了对侵入性检查的依赖。SHAP的整合确保了模型的临床可解释性,减轻了对许多机器学习方法“黑箱”性质的担忧。

结论

这种基于LGBM的可解释模型为TPE诊断提供了一种可靠的无创工具。它通过实时风险评估支持临床决策,并有望通过未来整合到临床信息系统中实现更广泛的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f89/12101438/6768a62ad651/peerj-13-19411-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验