Arizona State University, SCAI, Tempe, AZ, 85281, USA.
University of California Los Angeles, LA, USA.
J Biomed Inform. 2024 Jan;149:104548. doi: 10.1016/j.jbi.2023.104548. Epub 2023 Dec 1.
A major hurdle for the real time deployment of the AI models is ensuring trustworthiness of these models for the unseen population. More often than not, these complex models are black boxes in which promising results are generated. However, when scrutinized, these models begin to reveal implicit biases during the decision making, particularly for the minority subgroups.
We develop an efficient adversarial de-biasing approach with partial learning by incorporating the existing concept activation vectors (CAV) methodology, to reduce racial disparities while preserving the performance of the targeted task. CAV is originally a model interpretability technique which we adopted to identify convolution layers responsible for learning race and only fine-tune up to that layer instead of fine-tuning the complete network, limiting the drop in performance RESULTS:: The methodology has been evaluated on two independent medical image case-studies - chest X-ray and mammograms, and we also performed external validation on a different racial population. On the external datasets for the chest X-ray use-case, debiased models (averaged AUC 0.87 ) outperformed the baseline convolution models (averaged AUC 0.57 ) as well as the models trained with the popular fine-tuning strategy (averaged AUC 0.81). Moreover, the mammogram models is debiased using a single dataset (white, black and Asian) and improved the performance on an external datasets (averaged AUC 0.8 to 0.86 ) with completely different population (primarily Hispanic patients).
In this study, we demonstrated that the adversarial models trained only with internal data performed equally or often outperformed the standard fine-tuning strategy with data from an external setting. The adversarial training approach described can be applied regardless of predictor's model architecture, as long as the convolution model is trained using a gradient-based method. We release the training code with academic open-source license - https://github.com/ramon349/JBI2023_TCAV_debiasing.
人工智能模型实时部署的一个主要障碍是确保这些模型对未见人群的可信度。这些复杂的模型往往是黑箱,其中生成了有希望的结果。然而,在仔细审查时,这些模型在决策过程中开始暴露出隐含的偏见,特别是对于少数亚组。
我们开发了一种有效的对抗性去偏方法,通过结合现有的概念激活向量 (CAV) 方法,在保留目标任务性能的同时减少种族差异。CAV 最初是一种模型可解释性技术,我们采用该技术来识别负责学习种族的卷积层,只对该层进行微调,而不是对整个网络进行微调,从而限制性能下降。
该方法已在两个独立的医学图像案例研究 - 胸部 X 射线和乳房 X 光片上进行了评估,并且我们还在不同的种族人群上进行了外部验证。对于胸部 X 射线用例的外部数据集,去偏模型(平均 AUC 0.87)的表现优于基线卷积模型(平均 AUC 0.57)以及使用流行的微调策略训练的模型(平均 AUC 0.81)。此外,使用单个数据集(白人、黑人、亚洲人)对乳房 X 光片模型进行去偏,并在具有完全不同人群(主要是西班牙裔患者)的外部数据集上提高了性能(平均 AUC 从 0.8 提高到 0.86)。
在这项研究中,我们证明了仅使用内部数据训练的对抗模型表现与使用外部数据的标准微调策略相当,甚至通常更好。所描述的对抗训练方法可以应用于任何预测器的模型架构,只要卷积模型是使用基于梯度的方法进行训练的。我们以学术开源许可证的形式发布了培训代码 - https://github.com/ramon349/JBI2023_TCAV_debiasing。