Faculty of Medicine, University of Ottawa, Roger Guindon Hall, 451 Smyth Rd #2044, Ottawa, ON, K1H 8M5, Canada.
University of Maryland Medical Intelligent Imaging (UM2II) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, Room 1172, Baltimore, MD, 21201, USA.
Emerg Radiol. 2022 Apr;29(2):365-370. doi: 10.1007/s10140-022-02019-3. Epub 2022 Jan 10.
Deep convolutional neural networks (DCNNs) for diagnosis of disease on chest radiographs (CXR) have been shown to be biased against males or females if the datasets used to train them have unbalanced sex representation. Prior work has suggested that DCNNs can predict sex on CXR, which could aid forensic evaluations, but also be a source of bias.
To (1) evaluate the performance of DCNNs for predicting sex across different datasets and architectures and (2) evaluate visual biomarkers used by DCNNs to predict sex on CXRs.
Chest radiographs were obtained from the Stanford CheXPert and NIH Chest XRay14 datasets which comprised of 224,316 and 112,120 CXRs, respectively. To control for dataset size and class imbalance, random undersampling was used to reduce each dataset to 97,560 images that were balanced for sex. Each dataset was randomly split into training (70%), validation (10%), and test (20%) sets. Four DCNN architectures pre-trained on ImageNet were used for transfer learning. DCNNs were externally validated using a test set from the opposing dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUC). Class activation mapping (CAM) was used to generate heatmaps visualizing the regions contributing to the DCNN's prediction.
On the internal test set, DCNNs achieved AUROCs ranging from 0.98 to 0.99. On external validation, the models reached peak cross-dataset performance of 0.94 for the VGG19-Stanford model and 0.95 for the InceptionV3-NIH model. Heatmaps highlighted similar regions of attention between model architectures and datasets, localizing to the mediastinal and upper rib regions, as well as to the lower chest/diaphragmatic regions.
DCNNs trained on two large CXR datasets accurately predicted sex on internal and external test data with similar heatmap localizations across DCNN architectures and datasets. These findings support the notion that DCNNs can leverage imaging biomarkers to predict sex and potentially confound the accurate prediction of disease on CXRs and contribute to biased models. On the other hand, these DCNNs can be beneficial to emergency radiologists for forensic evaluations and identifying patient sex for patients whose identities are unknown, such as in acute trauma.
在使用不平衡性别代表性的数据集训练的情况下,用于诊断胸部 X 光片(CXR)疾病的深度卷积神经网络(DCNN)可能会对男性或女性产生偏差。先前的研究表明,DCNN 可以预测 CXR 上的性别,这有助于法医评估,但也可能成为偏见的来源。
(1)评估不同数据集和架构中 DCNN 预测性别的性能,以及(2)评估 DCNN 用于预测 CXR 上性别的视觉生物标志物。
从斯坦福 CheXPert 和 NIH Chest XRay14 数据集获得胸部 X 光片,分别包含 224,316 和 112,120 张 CXR。为了控制数据集大小和类别不平衡,使用随机欠采样将每个数据集减少到 97,560 张,这些图像在性别上是平衡的。每个数据集都随机分为训练(70%)、验证(10%)和测试(20%)集。使用预训练在 ImageNet 上的四个 DCNN 架构进行迁移学习。使用来自对立数据集的测试集对外在验证进行了评估。使用接收器操作特征曲线下的面积(AUC)进行性能评估。类激活映射(CAM)用于生成可视化 DCNN 预测贡献区域的热图。
在内部测试集上,DCNN 的 AUC 范围为 0.98 至 0.99。在外部验证中,模型达到了 VGG19-Stanford 模型为 0.94 和 InceptionV3-NIH 模型为 0.95 的峰值跨数据集性能。热图突出了模型架构和数据集之间相似的注意区域,定位到纵隔和上肋骨区域,以及下胸部/膈肌区域。
在两个大型 CXR 数据集上训练的 DCNN 可以准确预测内部和外部测试数据的性别,并且在 DCNN 架构和数据集之间具有相似的热图定位。这些发现支持这样的观点,即 DCNN 可以利用成像生物标志物来预测性别,并可能对 CXR 上疾病的准确预测产生影响,并导致模型产生偏差。另一方面,这些 DCNN 可以对急诊放射科医生进行法医评估和识别身份未知患者的性别(例如急性创伤)有所帮助。