Department of Computer Science, Stanford University, Stanford, California, United States of America.
Department of Medicine, Quantitative Sciences Unit, Stanford University, Stanford, California, United States of America.
PLoS Med. 2018 Nov 20;15(11):e1002686. doi: 10.1371/journal.pmed.1002686. eCollection 2018 Nov.
BACKGROUND: Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists. METHODS AND FINDINGS: We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt's discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4-28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863-0.910), 0.911 (95% CI 0.866-0.947), and 0.985 (95% CI 0.974-0.991), respectively, whereas CheXNeXt's AUCs were 0.831 (95% CI 0.790-0.870), 0.704 (95% CI 0.567-0.833), and 0.851 (95% CI 0.785-0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825-0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777-0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution. CONCLUSIONS: In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.
背景:胸部 X 光片解读对于检测包括肺结核和肺癌在内的胸部疾病至关重要,这些疾病每年在全球范围内影响数百万人。这项耗时的任务通常需要专家放射科医生阅读图像,这导致了基于疲劳的诊断错误,并且在世界上没有放射科医生的地区缺乏诊断专业知识。最近,深度学习方法已经能够在医学图像解释任务中达到专家级别的性能,这得益于大型网络架构和大型标记数据集的出现。本研究旨在研究深度学习算法在检测胸部 X 光片中病理方面的性能与执业放射科医生相比。
方法和发现:我们开发了 CheXNeXt,这是一种卷积神经网络,可以同时检测 14 种不同的病理,包括肺炎、胸腔积液、肺部肿块和结节在正面视图的胸部 X 光片中。CheXNeXt 在 ChestX-ray8 数据集上进行了训练和内部验证,使用包含至少 50 个每个原始病理标签的病例的 420 个图像的保留验证集。在这个验证集上,一个由 3 名 board-certified 心胸专家放射科医生组成的小组的多数投票作为参考标准。我们将 CheXNeXt 在验证集上的判别性能与 9 名放射科医生使用接收器操作特征曲线(AUC)下的面积进行了比较。放射科医生包括 6 名 board-certified 放射科医生(平均经验 12 年,范围 4-28 年)和 3 名高级放射科住院医师,来自 3 个学术机构。我们发现 CheXNeXt 在 11 种病理上达到了放射科医生的水平,而在 3 种病理上没有达到放射科医生的水平。放射科医生在心胸扩大、肺气肿和食管裂孔疝方面的 AUC 分别为 0.888(95%置信区间 [CI] 0.863-0.910)、0.911(95% CI 0.866-0.947)和 0.985(95% CI 0.974-0.991),而 CheXNeXt 的 AUC 分别为 0.831(95% CI 0.790-0.870)、0.704(95% CI 0.567-0.833)和 0.851(95% CI 0.785-0.909),分别。CheXNeXt 在检测肺不张方面的表现优于放射科医生,AUC 为 0.862(95% CI 0.825-0.895),明显高于放射科医生的 AUC 为 0.808(95% CI 0.777-0.838);在其他 10 种病理中,AUC 没有统计学差异。在验证集中解释 420 张图像的平均时间,放射科医生的时间(240 分钟)明显长于 CheXNeXt(1.5 分钟)。我们研究的主要局限性是,CheXNeXt 和放射科医生都不允许使用病史或审查以前的检查,并且评估仅限于来自单个机构的数据集。
结论:在这项研究中,我们开发并验证了一种深度学习算法,该算法在与执业放射科医生相当的性能水平上对胸部 X 光片中的临床重要异常进行分类。一旦在临床环境中进行前瞻性测试,该算法有可能扩大患者获得胸部 X 光诊断的机会。
Nanophotonics. 2025-6-23
Healthcare (Basel). 2025-7-8
BMC Oral Health. 2025-7-2
J Clin Imaging Sci. 2017-2-20
IEEE Trans Med Imaging. 2016-3-1