深度学习预测胸部 X 光片上的性别：导致算法产生偏差的潜在因素。

Deep learning prediction of sex on chest radiographs: a potential contributor to biased algorithms.

机构信息

Faculty of Medicine, University of Ottawa, Roger Guindon Hall, 451 Smyth Rd #2044, Ottawa, ON, K1H 8M5, Canada.

University of Maryland Medical Intelligent Imaging (UM2II) Center, Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, 670 W Baltimore St, Room 1172, Baltimore, MD, 21201, USA.

出版信息

Emerg Radiol. 2022 Apr;29(2):365-370. doi: 10.1007/s10140-022-02019-3. Epub 2022 Jan 10.

DOI:10.1007/s10140-022-02019-3

PMID:35006495

Abstract

BACKGROUND

Deep convolutional neural networks (DCNNs) for diagnosis of disease on chest radiographs (CXR) have been shown to be biased against males or females if the datasets used to train them have unbalanced sex representation. Prior work has suggested that DCNNs can predict sex on CXR, which could aid forensic evaluations, but also be a source of bias.

OBJECTIVE

To (1) evaluate the performance of DCNNs for predicting sex across different datasets and architectures and (2) evaluate visual biomarkers used by DCNNs to predict sex on CXRs.

MATERIALS AND METHODS

Chest radiographs were obtained from the Stanford CheXPert and NIH Chest XRay14 datasets which comprised of 224,316 and 112,120 CXRs, respectively. To control for dataset size and class imbalance, random undersampling was used to reduce each dataset to 97,560 images that were balanced for sex. Each dataset was randomly split into training (70%), validation (10%), and test (20%) sets. Four DCNN architectures pre-trained on ImageNet were used for transfer learning. DCNNs were externally validated using a test set from the opposing dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUC). Class activation mapping (CAM) was used to generate heatmaps visualizing the regions contributing to the DCNN's prediction.

RESULTS

On the internal test set, DCNNs achieved AUROCs ranging from 0.98 to 0.99. On external validation, the models reached peak cross-dataset performance of 0.94 for the VGG19-Stanford model and 0.95 for the InceptionV3-NIH model. Heatmaps highlighted similar regions of attention between model architectures and datasets, localizing to the mediastinal and upper rib regions, as well as to the lower chest/diaphragmatic regions.

CONCLUSION

DCNNs trained on two large CXR datasets accurately predicted sex on internal and external test data with similar heatmap localizations across DCNN architectures and datasets. These findings support the notion that DCNNs can leverage imaging biomarkers to predict sex and potentially confound the accurate prediction of disease on CXRs and contribute to biased models. On the other hand, these DCNNs can be beneficial to emergency radiologists for forensic evaluations and identifying patient sex for patients whose identities are unknown, such as in acute trauma.

摘要

背景

在使用不平衡性别代表性的数据集训练的情况下，用于诊断胸部 X 光片（CXR）疾病的深度卷积神经网络（DCNN）可能会对男性或女性产生偏差。先前的研究表明，DCNN 可以预测 CXR 上的性别，这有助于法医评估，但也可能成为偏见的来源。

目的

（1）评估不同数据集和架构中 DCNN 预测性别的性能，以及（2）评估 DCNN 用于预测 CXR 上性别的视觉生物标志物。

材料和方法

从斯坦福 CheXPert 和 NIH Chest XRay14 数据集获得胸部 X 光片，分别包含 224,316 和 112,120 张 CXR。为了控制数据集大小和类别不平衡，使用随机欠采样将每个数据集减少到 97,560 张，这些图像在性别上是平衡的。每个数据集都随机分为训练（70%）、验证（10%）和测试（20%）集。使用预训练在 ImageNet 上的四个 DCNN 架构进行迁移学习。使用来自对立数据集的测试集对外在验证进行了评估。使用接收器操作特征曲线下的面积（AUC）进行性能评估。类激活映射（CAM）用于生成可视化 DCNN 预测贡献区域的热图。

结果

在内部测试集上，DCNN 的 AUC 范围为 0.98 至 0.99。在外部验证中，模型达到了 VGG19-Stanford 模型为 0.94 和 InceptionV3-NIH 模型为 0.95 的峰值跨数据集性能。热图突出了模型架构和数据集之间相似的注意区域，定位到纵隔和上肋骨区域，以及下胸部/膈肌区域。

结论

在两个大型 CXR 数据集上训练的 DCNN 可以准确预测内部和外部测试数据的性别，并且在 DCNN 架构和数据集之间具有相似的热图定位。这些发现支持这样的观点，即 DCNN 可以利用成像生物标志物来预测性别，并可能对 CXR 上疾病的准确预测产生影响，并导致模型产生偏差。另一方面，这些 DCNN 可以对急诊放射科医生进行法医评估和识别身份未知患者的性别（例如急性创伤）有所帮助。

相似文献

Deep learning prediction of sex on chest radiographs: a potential contributor to biased algorithms.深度学习预测胸部 X 光片上的性别：导致算法产生偏差的潜在因素。

Emerg Radiol. 2022 Apr;29(2):365-370. doi: 10.1007/s10140-022-02019-3. Epub 2022 Jan 10.

Radiology "forensics": determination of age and sex from chest radiographs using deep learning.放射学“法医学”：使用深度学习从胸部 X 光片中确定年龄和性别。

Emerg Radiol. 2021 Oct;28(5):949-954. doi: 10.1007/s10140-021-01953-y. Epub 2021 Jun 5.

Deep Learning Method for Automated Classification of Anteroposterior and Posteroanterior Chest Radiographs.深度学习方法在前后位和后前位胸部 X 线片中的自动分类。

J Digit Imaging. 2019 Dec;32(6):925-930. doi: 10.1007/s10278-019-00208-0.

Comparison of radiologist versus natural language processing-based image annotations for deep learning system for tuberculosis screening on chest radiographs.比较放射科医生与基于自然语言处理的图像标注对胸部 X 光片结核病筛查深度学习系统的影响。

Clin Imaging. 2022 Jul;87:34-37. doi: 10.1016/j.clinimag.2022.04.009. Epub 2022 Apr 25.

Limited generalizability of deep learning algorithm for pediatric pneumonia classification on external data.深度学习算法对外部数据中小儿肺炎分类的泛化能力有限。

Emerg Radiol. 2022 Feb;29(1):107-113. doi: 10.1007/s10140-021-01954-x. Epub 2021 Oct 14.

Automated semantic labeling of pediatric musculoskeletal radiographs using deep learning.使用深度学习对儿科肌肉骨骼 X 光片进行自动语义标注。

Pediatr Radiol. 2019 Jul;49(8):1066-1070. doi: 10.1007/s00247-019-04408-2. Epub 2019 Apr 30.

Generalizable Inter-Institutional Classification of Abnormal Chest Radiographs Using Efficient Convolutional Neural Networks.利用高效卷积神经网络实现异常胸片的可推广的跨机构分类。

J Digit Imaging. 2019 Oct;32(5):888-896. doi: 10.1007/s10278-019-00180-9.

Deep learning, reusable and problem-based architectures for detection of consolidation on chest X-ray images.深度学习，可复用且基于问题的架构，用于检测胸部 X 射线图像中的实变。

Comput Methods Programs Biomed. 2020 Mar;185:105162. doi: 10.1016/j.cmpb.2019.105162. Epub 2019 Oct 31.

Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks.胸部放射摄影中的深度学习：使用卷积神经网络自动分类肺结核。

Radiology. 2017 Aug;284(2):574-582. doi: 10.1148/radiol.2017162326. Epub 2017 Apr 24.

Automated detection of moderate and large pneumothorax on frontal chest X-rays using deep convolutional neural networks: A retrospective study.使用深度卷积神经网络自动检测正位胸部 X 光片中的中至大量气胸：一项回顾性研究。

PLoS Med. 2018 Nov 20;15(11):e1002697. doi: 10.1371/journal.pmed.1002697. eCollection 2018 Nov.

引用本文的文献

Towards machine learning fairness in classifying multicategory causes of deaths in colorectal or lung cancer patients.迈向结直肠癌或肺癌患者多类别死亡原因分类中的机器学习公平性。

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf398.

Intersectional analysis for science and technology.科技的交叉性分析

Nature. 2025 Apr;640(8058):329-337. doi: 10.1038/s41586-025-08774-w. Epub 2025 Apr 9.

Towards machine learning fairness in classifying multicategory causes of deaths in colorectal or lung cancer patients.迈向结直肠癌或肺癌患者多类别死因分类中的机器学习公平性

bioRxiv. 2025 Feb 19:2025.02.14.638368. doi: 10.1101/2025.02.14.638368.

Recognition of Patient Gender: A Machine Learning Preliminary Analysis Using Heart Sounds from Children and Adolescents.患者性别的识别：使用儿童和青少年心音的机器学习初步分析

Pediatr Cardiol. 2024 Jun 27. doi: 10.1007/s00246-024-03561-2.

DF-DM: A foundational process model for multimodal data fusion in the artificial intelligence era.DF-DM：人工智能时代多模态数据融合的基础过程模型。

Res Sq. 2024 Apr 23:rs.3.rs-4277992. doi: 10.21203/rs.3.rs-4277992/v1.

Advances in machine learning-based bacteria analysis for forensic identification: identity, ethnicity, and site of occurrence.基于机器学习的法医鉴定细菌分析进展：身份、种族和发生地点。

Front Microbiol. 2023 Dec 21;14:1332857. doi: 10.3389/fmicb.2023.1332857. eCollection 2023.

Overcoming the Challenges in the Development and Implementation of Artificial Intelligence in Radiology: A Comprehensive Review of Solutions Beyond Supervised Learning.克服放射学中人工智能发展和实施面临的挑战：超越监督学习的解决方案综合述评。

Korean J Radiol. 2023 Nov;24(11):1061-1080. doi: 10.3348/kjr.2023.0393. Epub 2023 Aug 28.

AI-based radiodiagnosis using chest X-rays: A review.基于人工智能的胸部X光放射诊断：综述

Front Big Data. 2023 Apr 6;6:1120989. doi: 10.3389/fdata.2023.1120989. eCollection 2023.

本文引用的文献

Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis.医学影像数据集的性别失衡会导致计算机辅助诊断的分类器产生偏差。

Proc Natl Acad Sci U S A. 2020 Jun 9;117(23):12592-12594. doi: 10.1073/pnas.1919012117. Epub 2020 May 26.

MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports.MIMIC-CXR，一个去标识化的、公开可用的、包含自由文本报告的胸部 X 光数据库。

Sci Data. 2019 Dec 12;6(1):317. doi: 10.1038/s41597-019-0322-0.

Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.深度学习在胸片诊断中的应用：CheXNeXt 算法与临床放射科医生的回顾性比较。

PLoS Med. 2018 Nov 20;15(11):e1002686. doi: 10.1371/journal.pmed.1002686. eCollection 2018 Nov.

Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study.深度学习模型检测胸片肺炎的可变泛化性能：一项横断面研究。

PLoS Med. 2018 Nov 6;15(11):e1002683. doi: 10.1371/journal.pmed.1002683. eCollection 2018 Nov.

Radiology. 2017 Aug;284(2):574-582. doi: 10.1148/radiol.2017162326. Epub 2017 Apr 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

深度学习预测胸部 X 光片上的性别：导致算法产生偏差的潜在因素。

Deep learning prediction of sex on chest radiographs: a potential contributor to biased algorithms.

机构信息

出版信息

BACKGROUND

OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSION

背景

目的

材料和方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献