Suppr超能文献

一种肺部疾病三类预测模型的开发、部署及特征可解释性

Development, deployment, and feature interpretability of a three-class prediction model for pulmonary diseases.

作者信息

Cao Zhenyu, Xu Gang, Gao Yuan, Xu Jianying, Tian Fengjuan, Shi Hengfeng, Yang Dengfa, Xie Zongyu, Wang Jian

机构信息

Department of Radiology, Tongde Hospital of Zhejiang Province Afflicted to Zhejiang Chinese Medical University (Tongde Hospital of Zhejiang Province), Hangzhou, China.

Department of Radiology, Xin Hua Hospital of Huainan, Huainan, China.

出版信息

Insights Imaging. 2025 Jun 26;16(1):133. doi: 10.1186/s13244-025-02020-7.

Abstract

PURPOSE

To develop a high-performance machine learning model for predicting and interpreting features of pulmonary diseases.

PATIENTS AND METHODS

This retrospective study analyzed clinical and imaging data from patients with non-small cell lung cancer (NSCLC), granulomatous inflammation, and benign tumors, collected across multiple centers from January 2015 to October 2023. Data from two hospitals in Anhui Province were split into a development set (n = 1696) and a test set (n = 424) in an 8:2 ratio, with an external validation set (n = 909) from Zhejiang Province. Features with p < 0.05 from univariate analyses were selected using the Boruta algorithm for input into Random Forest (RF) and XGBoost models. Model efficacy was assessed using receiver operating characteristic (ROC) analysis.

RESULTS

A total of 3030 patients were included: 2269 with NSCLC, 529 with granulomatous inflammation, and 232 with benign tumors. The Obuchowski indices for RF and XGBoost in the test set were 0.7193 (95% CI: 0.6567-0.7812) and 0.8282 (95% CI: 0.7883-0.8650), respectively. In the external validation set, indices were 0.7932 (95% CI: 0.7572-0.8250) for RF and 0.8074 (95% CI: 0.7740-0.8387) for XGBoost. XGBoost achieved better accuracy in both the test (0.81) and external validation (0.79) sets. Calibration Curve and Decision Curve Analysis (DCA) showed XGBoost offered higher net clinical benefit.

CONCLUSION

The XGBoost model outperforms RF in the three-class classification of lung diseases.

CRITICAL RELEVANCE STATEMENT

XGBoost surpasses Random Forest in accurately classifying NSCLC, granulomatous inflammation, and benign tumors, offering superior clinical utility via multicenter data.

KEY POINTS

Lung cancer classification model has broad clinical applicability. XGBoost outperforms random forests using CT imaging data. XGBoost model can be deployed on a website for clinicians.

摘要

目的

开发一种用于预测和解释肺部疾病特征的高性能机器学习模型。

患者与方法

这项回顾性研究分析了2015年1月至2023年10月期间多个中心收集的非小细胞肺癌(NSCLC)、肉芽肿性炎症和良性肿瘤患者的临床和影像数据。安徽省两家医院的数据按8:2的比例分为开发集(n = 1696)和测试集(n = 424),另有来自浙江省的外部验证集(n = 909)。使用Boruta算法从单变量分析中筛选出p < 0.05的特征,输入随机森林(RF)和XGBoost模型。使用受试者工作特征(ROC)分析评估模型效能。

结果

共纳入3030例患者,其中NSCLC患者2269例,肉芽肿性炎症患者529例,良性肿瘤患者共232例。测试集中RF和XGBoost的Obuchowski指数分别为0.7193(95%CI:0.6567 - 0.7812)和0.8282(95%CI:0.7883 - 0.8650)。在外部验证集中,RF的指数为0.7932(95%CI:0.7572 - 0.8250),XGBoost的指数为0.8074(95%CI:0.7740 - 0.8387)。XGBoost在测试集(0.81)和外部验证集(0.79)中均具有更高的准确性。校准曲线和决策曲线分析(DCA)显示XGBoost具有更高的净临床效益。

结论

在肺部疾病的三类分类中,XGBoost模型优于RF。

关键相关性声明

在准确分类NSCLC、肉芽肿性炎症和良性肿瘤方面,XGBoost优于随机森林,通过多中心数据提供了更高的临床实用性。

要点

肺癌分类模型具有广泛的临床适用性。使用CT影像数据时,XGBoost优于随机森林。XGBoost模型可部署在网站上供临床医生使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ca3/12202249/83774db742af/13244_2025_2020_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验