From the Departments of Computer Science (J.A.D., C.R.), Biomedical Data Science (D.Y., D.L.R.), and Radiology (C.P.L., D.L.R., M.P.L.), Stanford University, 300 Pasteur Dr, Stanford, CA 94305.
Radiology. 2019 Feb;290(2):537-544. doi: 10.1148/radiol.2018181422. Epub 2018 Nov 13.
Purpose To assess the ability of convolutional neural networks (CNNs) to enable high-performance automated binary classification of chest radiographs. Materials and Methods In a retrospective study, 216 431 frontal chest radiographs obtained between 1998 and 2012 were procured, along with associated text reports and a prospective label from the attending radiologist. This data set was used to train CNNs to classify chest radiographs as normal or abnormal before evaluation on a held-out set of 533 images hand-labeled by expert radiologists. The effects of development set size, training set size, initialization strategy, and network architecture on end performance were assessed by using standard binary classification metrics; detailed error analysis, including visualization of CNN activations, was also performed. Results Average area under the receiver operating characteristic curve (AUC) was 0.96 for a CNN trained with 200 000 images. This AUC value was greater than that observed when the same model was trained with 2000 images (AUC = 0.84, P < .005) but was not significantly different from that observed when the model was trained with 20 000 images (AUC = 0.95, P > .05). Averaging the CNN output score with the binary prospective label yielded the best-performing classifier, with an AUC of 0.98 (P < .005). Analysis of specific radiographs revealed that the model was heavily influenced by clinically relevant spatial regions but did not reliably generalize beyond thoracic disease. Conclusion CNNs trained with a modestly sized collection of prospectively labeled chest radiographs achieved high diagnostic performance in the classification of chest radiographs as normal or abnormal; this function may be useful for automated prioritization of abnormal chest radiographs. © RSNA, 2018 Online supplemental material is available for this article. See also the editorial by van Ginneken in this issue.
目的 评估卷积神经网络(CNN)在实现高性能自动胸部 X 线摄影二进制分类中的能力。
材料与方法 在一项回顾性研究中,获取了 1998 年至 2012 年间获得的 216431 张正位胸部 X 线片,以及相关的文字报告和主治放射科医生的前瞻性标签。该数据集用于训练 CNN 对胸部 X 线片进行正常或异常分类,然后在由专家放射科医生手动标记的 533 张图像的独立数据集上进行评估。使用标准的二进制分类指标评估开发集大小、训练集大小、初始化策略和网络架构对最终性能的影响;还进行了详细的错误分析,包括 CNN 激活的可视化。
结果 用 20 万张图像训练的 CNN 的平均接收器工作特征曲线下面积(AUC)为 0.96。当使用相同的模型用 2000 张图像进行训练时,AUC 值(AUC = 0.84,P <.005)大于观察值,但与使用 2 万张图像进行训练时观察到的 AUC 值(AUC = 0.95,P >.05)无显著差异。将 CNN 输出分数与二进制前瞻性标签平均,得到表现最佳的分类器,AUC 为 0.98(P <.005)。对特定 X 线片的分析表明,该模型受临床相关空间区域的影响较大,但不能可靠地推广到胸部疾病之外。
结论 用适度数量的前瞻性标记胸部 X 线片训练的 CNN 在对胸部 X 线片进行正常或异常分类方面取得了较高的诊断性能;此功能可能有助于对异常胸部 X 线片进行自动优先级排序。
J Med Imaging (Bellingham). 2024-11
Bone Joint Res. 2024-10-17
Comput Struct Biotechnol J. 2024-8-12
Patterns (N Y). 2024-7-12
Adv Neural Inf Process Syst. 2016-12
Proceedings VLDB Endowment. 2017-11
J Digit Imaging. 2017-8