Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, California, United States of America.
Center for Digital Health Innovation, University of California, San Francisco, San Francisco, California, United States of America.
PLoS Med. 2018 Nov 20;15(11):e1002697. doi: 10.1371/journal.pmed.1002697. eCollection 2018 Nov.
BACKGROUND: Pneumothorax can precipitate a life-threatening emergency due to lung collapse and respiratory or circulatory distress. Pneumothorax is typically detected on chest X-ray; however, treatment is reliant on timely review of radiographs. Since current imaging volumes may result in long worklists of radiographs awaiting review, an automated method of prioritizing X-rays with pneumothorax may reduce time to treatment. Our objective was to create a large human-annotated dataset of chest X-rays containing pneumothorax and to train deep convolutional networks to screen for potentially emergent moderate or large pneumothorax at the time of image acquisition. METHODS AND FINDINGS: In all, 13,292 frontal chest X-rays (3,107 with pneumothorax) were visually annotated by radiologists. This dataset was used to train and evaluate multiple network architectures. Images showing large- or moderate-sized pneumothorax were considered positive, and those with trace or no pneumothorax were considered negative. Images showing small pneumothorax were excluded from training. Using an internal validation set (n = 1,993), we selected the 2 top-performing models; these models were then evaluated on a held-out internal test set based on area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive predictive value (PPV). The final internal test was performed initially on a subset with small pneumothorax excluded (as in training; n = 1,701), then on the full test set (n = 1,990), with small pneumothorax included as positive. External evaluation was performed using the National Institutes of Health (NIH) ChestX-ray14 set, a public dataset labeled for chest pathology based on text reports. All images labeled with pneumothorax were considered positive, because the NIH set does not classify pneumothorax by size. In internal testing, our "high sensitivity model" produced a sensitivity of 0.84 (95% CI 0.78-0.90), specificity of 0.90 (95% CI 0.89-0.92), and AUC of 0.94 for the test subset with small pneumothorax excluded. Our "high specificity model" showed sensitivity of 0.80 (95% CI 0.72-0.86), specificity of 0.97 (95% CI 0.96-0.98), and AUC of 0.96 for this set. PPVs were 0.45 (95% CI 0.39-0.51) and 0.71 (95% CI 0.63-0.77), respectively. Internal testing on the full set showed expected decreased performance (sensitivity 0.55, specificity 0.90, and AUC 0.82 for high sensitivity model and sensitivity 0.45, specificity 0.97, and AUC 0.86 for high specificity model). External testing using the NIH dataset showed some further performance decline (sensitivity 0.28-0.49, specificity 0.85-0.97, and AUC 0.75 for both). Due to labeling differences between internal and external datasets, these findings represent a preliminary step towards external validation. CONCLUSIONS: We trained automated classifiers to detect moderate and large pneumothorax in frontal chest X-rays at high levels of performance on held-out test data. These models may provide a high specificity screening solution to detect moderate or large pneumothorax on images collected when human review might be delayed, such as overnight. They are not intended for unsupervised diagnosis of all pneumothoraces, as many small pneumothoraces (and some larger ones) are not detected by the algorithm. Implementation studies are warranted to develop appropriate, effective clinician alerts for the potentially critical finding of pneumothorax, and to assess their impact on reducing time to treatment.
背景:气胸可因肺部塌陷和呼吸或循环窘迫而导致危及生命的紧急情况。气胸通常在胸部 X 光片上检测到;然而,治疗依赖于及时审查射线照片。由于当前的成像量可能导致大量等待审查的射线照片工作列表,因此使用气胸的自动优先排序 X 射线的方法可能会缩短治疗时间。我们的目标是创建一个包含气胸的大型人工注释的胸部 X 射线数据集,并训练深度卷积网络,以便在图像采集时筛选出潜在的中度或大量气胸。
方法和发现:总共对 13292 张正面胸部 X 射线(3107 张有气胸)进行了放射科医生的视觉注释。该数据集用于训练和评估多个网络架构。显示大或中等大小气胸的图像被视为阳性,显示少量或无气胸的图像被视为阴性。显示少量气胸的图像被排除在训练之外。使用内部验证集(n = 1993),我们选择了前 2 个表现最佳的模型;然后在基于接收者操作特征曲线(AUC)、灵敏度、特异性和阳性预测值(PPV)的内部测试集上评估这些模型。最终的内部测试最初在排除小量气胸的子集上进行(与训练相同;n = 1701),然后在全测试集上进行(n = 1990),将小量气胸包括为阳性。外部评估使用美国国立卫生研究院(NIH)ChestX-ray14 数据集进行,该数据集根据文本报告对胸部病理学进行了标记。由于 NIH 数据集不按大小对气胸进行分类,因此所有标记有气胸的图像都被视为阳性。在内部测试中,我们的“高灵敏度模型”产生了 0.84 的灵敏度(95%CI 0.78-0.90)、特异性为 0.90(95%CI 0.89-0.92)和排除小量气胸的测试子集的 AUC 为 0.94。我们的“高特异性模型”显示灵敏度为 0.80(95%CI 0.72-0.86)、特异性为 0.97(95%CI 0.96-0.98)和 AUC 为 0.96。PPV 分别为 0.45(95%CI 0.39-0.51)和 0.71(95%CI 0.63-0.77)。在全数据集上的内部测试显示出预期的性能下降(高灵敏度模型的灵敏度为 0.55、特异性为 0.90 和 AUC 为 0.82,高特异性模型的灵敏度为 0.45、特异性为 0.97 和 AUC 为 0.86)。使用 NIH 数据集进行外部测试显示出一些进一步的性能下降(灵敏度为 0.28-0.49、特异性为 0.85-0.97 和 AUC 为 0.75)。由于内部和外部数据集之间的标记差异,这些发现代表了向外部验证迈出的初步步骤。
结论:我们训练了自动分类器,以便在保留的测试数据上以高水平的性能检测正面胸部 X 射线中的中度和大量气胸。这些模型可以为检测中度或大量气胸提供高特异性的筛选解决方案,这些模型适用于可能会延迟人工审查的情况下,例如夜间采集的图像。它们不适用于所有气胸的无监督诊断,因为许多小量气胸(和一些较大的气胸)不会被算法检测到。需要进行实施研究,以开发适当、有效的临床医生警报,用于潜在的关键气胸发现,并评估其对缩短治疗时间的影响。
Front Physiol. 2025-5-27
Front Med (Lausanne). 2024-1-29
J Am Acad Orthop Surg. 2024-3-1
Neurooncol Pract. 2015-12
J Digit Imaging. 2017-2
Nature. 2015-5-28
World J Crit Care Med. 2014-2-4
Chest. 2012-4
Arch Intern Med. 2010-2-22