Stanley Emma A M, Wilms Matthias, Mouches Pauline, Forkert Nils D
University of Calgary, Department of Biomedical Engineering, Calgary, Alberta, Canada.
University of Calgary, Department of Radiology, Calgary, Alberta, Canada.
J Med Imaging (Bellingham). 2022 Nov;9(6):061102. doi: 10.1117/1.JMI.9.6.061102. Epub 2022 Aug 26.
Explainability and fairness are two key factors for the effective and ethical clinical implementation of deep learning-based machine learning models in healthcare settings. However, there has been limited work on investigating how unfair performance manifests in explainable artificial intelligence (XAI) methods, and how XAI can be used to investigate potential reasons for unfairness. Thus, the aim of this work was to analyze the effects of previously established sociodemographic-related confounders on classifier performance and explainability methods. A convolutional neural network (CNN) was trained to predict biological sex from T1-weighted brain MRI datasets of 4547 9- to 10-year-old adolescents from the Adolescent Brain Cognitive Development study. Performance disparities of the trained CNN between White and Black subjects were analyzed and saliency maps were generated for each subgroup at the intersection of sex and race. The classification model demonstrated a significant difference in the percentage of correctly classified White male ( ) and Black male ( ) children. Conversely, slightly higher performance was found for Black female ( ) compared with White female ( ) children. Saliency maps showed subgroup-specific differences, corresponding to brain regions previously associated with pubertal development. In line with this finding, average pubertal development scores of subjects used in this study were significantly different between Black and White females ( ) and males ( ). We demonstrate that a CNN with significantly different sex classification performance between Black and White adolescents can identify different important brain regions when comparing subgroup saliency maps. Importance scores vary substantially between subgroups within brain structures associated with pubertal development, a race-associated confounder for predicting sex. We illustrate that unfair models can produce different XAI results between subgroups and that these results may explain potential reasons for biased performance.
可解释性和公平性是基于深度学习的机器学习模型在医疗保健环境中有效且符合伦理地临床应用的两个关键因素。然而,关于研究不公平性能如何在可解释人工智能(XAI)方法中表现,以及XAI如何用于调查不公平的潜在原因的工作还很有限。因此,这项工作的目的是分析先前确定的社会人口统计学相关混杂因素对分类器性能和可解释性方法的影响。我们训练了一个卷积神经网络(CNN),以从青少年大脑认知发展研究中4547名9至10岁青少年的T1加权脑MRI数据集中预测生物性别。分析了训练后的CNN在白人和黑人受试者之间的性能差异,并为性别和种族交叉的每个亚组生成了显著性图。分类模型显示,正确分类的白人男性( )和黑人男性( )儿童的百分比存在显著差异。相反,发现黑人女性( )的表现略高于白人女性( )儿童。显著性图显示了亚组特异性差异,对应于先前与青春期发育相关的脑区。与此发现一致,本研究中使用的受试者的平均青春期发育得分在黑人和白人女性( )以及男性之间存在显著差异。我们证明,在黑人和白人青少年之间具有显著不同性别分类性能的CNN在比较亚组显著性图时可以识别不同的重要脑区。在与青春期发育相关的脑结构内的亚组之间,重要性得分差异很大,青春期发育是预测性别的种族相关混杂因素。我们表明,不公平模型可以在亚组之间产生不同的XAI结果,并且这些结果可能解释了性能偏差的潜在原因。