Suppr超能文献

用于胸部计算机断层扫描多类别肺炎分类的深度学习模型的开发与验证:一项多中心多阅片者研究

Development and validation of a deep learning model for multicategory pneumonia classification on chest computed tomography: a multicenter and multireader study.

作者信息

Shi Chunzi, Shao Ying, Shan Fei, Shen Jie, Huang Xueni, Chen Chuan, Lu Yang, Zhan Yi, Shi Nannan, Wu Jili, Wang Keying, Gao Yaozong, Shi Yuxin, Song Fengxiang

机构信息

Department of Radiology, Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine, Shanghai, China.

Qingdao Institute, School of Life Medicine, Department of Radiology, Shanghai Public Health Clinical Center, Fudan University, Qingdao, China.

出版信息

Quant Imaging Med Surg. 2023 Dec 1;13(12):8641-8656. doi: 10.21037/qims-23-1097. Epub 2023 Oct 21.

Abstract

BACKGROUND

Accurate diagnosis of pneumonia is vital for effective disease management and mortality reduction, but it can be easily confused with other conditions on chest computed tomography (CT) due to an overlap in imaging features. We aimed to develop and validate a deep learning (DL) model based on chest CT for accurate classification of viral pneumonia (VP), bacterial pneumonia (BP), fungal pneumonia (FP), pulmonary tuberculosis (PTB), and no pneumonia (NP) conditions.

METHODS

In total, 1,776 cases from five hospitals in different regions were retrospectively collected from September 2019 to June 2023. All cases were enrolled according to inclusion and exclusion criteria, and ultimately 1,611 cases were used to develop the DL model with 5-fold cross-validation, with 165 cases being used as the external test set. Five radiologists blindly reviewed the images from the internal and external test sets first without and then with DL model assistance. Precision, recall, F1-score, weighted F1-average, and area under the curve (AUC) were used to evaluate the model performance.

RESULTS

The F1-scores of the DL model on the internal and external test sets were, respectively, 0.947 [95% confidence interval (CI): 0.936-0.958] and 0.933 (95% CI: 0.916-0.950) for VP, 0.511 (95% CI: 0.487-0.536) and 0.591 (95% CI: 0.557-0.624) for BP, 0.842 (95% CI: 0.824-0.860) and 0.848 (95% CI: 0.824-0.873) for FP, 0.843 (95% CI: 0.826-0.861) and 0.795 (95% CI: 0.767-0.822) for PTB, and 0.975 (95% CI: 0.968-0.983) and 0.976 (95% CI: 0.965-0.986) for NP, with a weighted F1-average of 0.883 (95% CI: 0.867-0.898) and 0.846 (95% CI: 0.822-0.871), respectively. The model performed well and showed comparable performance in both the internal and external test sets. The F1-score of the DL model was higher than that of radiologists, and with DL model assistance, radiologists achieved a higher F1-score. On the external test set, the F1-score of the DL model (F1-score 0.848; 95% CI: 0.824-0.873) was higher than that of the radiologists (F1-score 0.541; 95% CI: 0.507-0.575) as was its precision for the other three pneumonia conditions (all P values <0.001). With DL model assistance, the F1-score for FP (F1-score 0.541; 95% CI: 0.507-0.575) was higher than that achieved without assistance (F1-score 0.778; 95% CI: 0.750-0.807) as was its precision for the other three pneumonia conditions (all P values <0.001).

CONCLUSIONS

The DL approach can effectively classify pneumonia and can help improve radiologists' performance, supporting the full integration of DL results into the routine workflow of clinicians.

摘要

背景

肺炎的准确诊断对于有效管理疾病和降低死亡率至关重要,但由于成像特征存在重叠,在胸部计算机断层扫描(CT)上它很容易与其他病症混淆。我们旨在开发并验证一种基于胸部CT的深度学习(DL)模型,用于对病毒性肺炎(VP)、细菌性肺炎(BP)、真菌性肺炎(FP)、肺结核(PTB)和无肺炎(NP)情况进行准确分类。

方法

2019年9月至2023年6月期间,我们从不同地区的五家医院回顾性收集了1776例病例。所有病例均根据纳入和排除标准进行入组,最终1611例病例用于通过五折交叉验证开发DL模型,165例病例用作外部测试集。五名放射科医生首先在无DL模型辅助的情况下,然后在有DL模型辅助的情况下对内部和外部测试集的图像进行盲法评估。使用精确率、召回率、F1分数、加权F1平均值和曲线下面积(AUC)来评估模型性能。

结果

DL模型在内部和外部测试集上,VP的F1分数分别为0.947[95%置信区间(CI):0.936 - 0.958]和0.933(95%CI:0.916 - 0.950),BP的F1分数分别为0.511(95%CI:0.487 - 0.536)和0.591(95%CI:0.557 - 0.624),FP的F1分数分别为0.842(95%CI:0.824 - 0.860)和0.848(95%CI:0.824 - 0.873),PTB的F1分数分别为0.843(95%CI:0.826 - 0.861)和0.795(95%CI:0.767 - 0.822),NP的F1分数分别为0.975(95%CI:0.968 - 0.983)和0.976(95%CI:0.965 - 0.986),加权F1平均值分别为0.883(95%CI:0.867 - 0.898)和0.846(95%CI:0.822 - 0.871)。该模型表现良好,在内部和外部测试集中均表现出可比的性能。DL模型的F1分数高于放射科医生,并且在DL模型的辅助下,放射科医生获得了更高的F1分数。在外部测试集上,DL模型的F1分数(F1分数0.848;95%CI:0.824 - 0.873)高于放射科医生的F1分数(F1分数0.541;95%CI:0.507 - 0.575),其对其他三种肺炎情况的精确率也是如此(所有P值<0.001)。在DL模型的辅助下,FP的F1分数(F1分数0.541;95%CI:0.507 - 0.575)高于无辅助时的F1分数(F1分数0.778;95%CI:0.750 - 0.807),其对其他三种肺炎情况的精确率也是如此(所有P值<0.001)。

结论

DL方法可以有效地对肺炎进行分类,并有助于提高放射科医生的表现,支持将DL结果全面整合到临床医生的常规工作流程中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d49/10722067/45fff6147816/qims-13-12-8641-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验