Warren Alpert Medical School, Brown University, Box G-9130, Providence, RI, 02912, USA.
Department of Diagnostic Imaging, Rhode Island Hospital, 593 Eddy St, Main, Floor 3, Providence, RI, 02903, USA.
J Digit Imaging. 2019 Oct;32(5):888-896. doi: 10.1007/s10278-019-00180-9.
Our objective is to evaluate the effectiveness of efficient convolutional neural networks (CNNs) for abnormality detection in chest radiographs and investigate the generalizability of our models on data from independent sources. We used the National Institutes of Health ChestX-ray14 (NIH-CXR) and the Rhode Island Hospital chest radiograph (RIH-CXR) datasets in this study. Both datasets were split into training, validation, and test sets. The DenseNet and MobileNetV2 CNN architectures were used to train models on each dataset to classify chest radiographs into normal or abnormal categories; models trained on NIH-CXR were designed to also predict the presence of 14 different pathological findings. Models were evaluated on both NIH-CXR and RIH-CXR test sets based on the area under the receiver operating characteristic curve (AUROC). DenseNet and MobileNetV2 models achieved AUROCs of 0.900 and 0.893 for normal versus abnormal classification on NIH-CXR and AUROCs of 0.960 and 0.951 on RIH-CXR. For the 14 pathological findings in NIH-CXR, MobileNetV2 achieved an AUROC within 0.03 of DenseNet for each finding, with an average difference of 0.01. When externally validated on independently collected data (e.g., RIH-CXR-trained models on NIH-CXR), model AUROCs decreased by 3.6-5.2% relative to their locally trained counterparts. MobileNetV2 achieved comparable performance to DenseNet in our analysis, demonstrating the efficacy of efficient CNNs for chest radiograph abnormality detection. In addition, models were able to generalize to external data albeit with performance decreases that should be taken into consideration when applying models on data from different institutions.
我们的目标是评估高效卷积神经网络(CNN)在胸部 X 光片中异常检测的有效性,并研究我们的模型在来自独立来源的数据上的泛化能力。在这项研究中,我们使用了美国国立卫生研究院胸部 X 光片(NIH-CXR)和罗得岛医院胸部 X 光片(RIH-CXR)数据集。这两个数据集都被分为训练集、验证集和测试集。我们使用 DenseNet 和 MobileNetV2 CNN 架构在每个数据集上训练模型,将胸部 X 光片分为正常或异常类别;在 NIH-CXR 上训练的模型旨在预测 14 种不同的病理发现。我们根据接受者操作特征曲线下的面积(AUROC)在 NIH-CXR 和 RIH-CXR 测试集上评估模型。DenseNet 和 MobileNetV2 模型在 NIH-CXR 上实现了正常与异常分类的 AUROC 分别为 0.900 和 0.893,在 RIH-CXR 上的 AUROC 分别为 0.960 和 0.951。对于 NIH-CXR 中的 14 种病理发现,MobileNetV2 为每个发现的 DenseNet 实现了 0.03 的 AUROC,平均差异为 0.01。当在独立收集的数据(例如,在 NIH-CXR 上训练的 RIH-CXR 模型)上进行外部验证时,模型的 AUROC 相对于其本地训练的模型降低了 3.6-5.2%。在我们的分析中,MobileNetV2 与 DenseNet 实现了相当的性能,证明了高效 CNN 对胸部 X 光片异常检测的有效性。此外,模型能够推广到外部数据,尽管性能下降,但在将模型应用于来自不同机构的数据时应考虑这些下降。