Department of Radiology, Mount Sinai Health System, New York, New York.
Deaprtment of Radiology, Upstate University Hospital, Syracuse, New York.
J Am Coll Radiol. 2022 Oct;19(10):1151-1161. doi: 10.1016/j.jacr.2022.06.008. Epub 2022 Aug 11.
Deep learning models are increasingly informing medical decision making, for instance, in the detection of acute intracranial hemorrhage and pulmonary embolism. However, many models are trained on medical image databases that poorly represent the diversity of the patients they serve. In turn, many artificial intelligence models may not perform as well on assisting providers with important medical decisions for underrepresented populations.
Assessment of the ability of deep learning models to classify the self-reported gender, age, self-reported ethnicity, and insurance status of an individual patient from a given chest radiograph.
Models were trained and tested with 55,174 radiographs in the MIMIC Chest X-ray (MIMIC-CXR) database. External validation data came from two separate databases, one from CheXpert and another from a multihospital urban health care system after institutional review board approval. Macro-averaged area under the curve (AUC) values were used to evaluate performance of models. Code used for this study is open-source and available at https://github.com/ai-bias/cxr-bias, and pixelstopatients.com/models/demographics.
Accuracy of models to predict gender was nearly perfect, with 0.999 (95% confidence interval: 0.99-0.99) AUC on held-out test data and 0.994 (0.99-0.99) and 0.997 (0.99-0.99) on external validation data. There was high accuracy to predict age and ethnicity, ranging from 0.854 (0.80-0.91) to 0.911 (0.88-0.94) AUC, and moderate accuracy to predict insurance status, with AUC ranging from 0.705 (0.60-0.81) on held-out test data to 0.675 (0.54-0.79) on external validation data.
Deep learning models can predict the age, self-reported gender, self-reported ethnicity, and insurance status of a patient from a chest radiograph. Visualization techniques are useful to ensure deep learning models function as intended and to demonstrate anatomical regions of interest. These models can be used to ensure that training data are diverse, thereby ensuring artificial intelligence models that work on diverse populations.
深度学习模型越来越多地为医疗决策提供信息,例如在急性颅内出血和肺栓塞的检测中。然而,许多模型都是在医疗图像数据库上进行训练的,这些数据库对他们所服务的患者的多样性的代表性很差。反过来,许多人工智能模型在协助医疗服务提供者做出重要的医疗决策方面,对于代表性不足的人群的表现可能并不理想。
评估深度学习模型从给定的胸部 X 光片中分类个体患者的自我报告性别、年龄、自我报告种族和保险状况的能力。
使用 MIMIC 胸部 X 射线(MIMIC-CXR)数据库中的 55174 张射线照片对模型进行训练和测试。外部验证数据来自两个独立的数据库,一个来自 CheXpert,另一个来自经过机构审查委员会批准的多医院城市医疗保健系统。使用宏平均曲线下面积(AUC)值来评估模型的性能。本研究使用的代码是开源的,并可在 https://github.com/ai-bias/cxr-bias 和 pixelstopatients.com/models/demographics 上获得。
模型预测性别的准确率几乎达到完美,在保留测试数据上的 AUC 为 0.999(95%置信区间:0.99-0.99),在外部验证数据上的 AUC 为 0.994(0.99-0.99)和 0.997(0.99-0.99)。预测年龄和种族的准确性很高,范围从 0.854(0.80-0.91)到 0.911(0.88-0.94)AUC,预测保险状况的准确性适中,在保留测试数据上的 AUC 范围从 0.705(0.60-0.81)到外部验证数据上的 0.675(0.54-0.79)。
深度学习模型可以从胸部 X 光片中预测患者的年龄、自我报告的性别、自我报告的种族和保险状况。可视化技术有助于确保深度学习模型按预期运行,并展示感兴趣的解剖区域。这些模型可用于确保训练数据具有多样性,从而确保在不同人群中运行的人工智能模型能够发挥作用。