Department of Radiology, Mayo Clinic, Scottsdale, Arizona; School of Computing and Augmented Intelligence, Arizona State University, Tempe, Arizona.
Mayo Clinic, Scottsdale, Arizona.
J Am Coll Radiol. 2023 Sep;20(9):842-851. doi: 10.1016/j.jacr.2023.06.025. Epub 2023 Jul 27.
Despite the expert-level performance of artificial intelligence (AI) models for various medical imaging tasks, real-world performance failures with disparate outputs for various subgroups limit the usefulness of AI in improving patients' lives. Many definitions of fairness have been proposed, with discussions of various tensions that arise in the choice of an appropriate metric to use to evaluate bias; for example, should one aim for individual or group fairness? One central observation is that AI models apply "shortcut learning" whereby spurious features (such as chest tubes and portable radiographic markers on intensive care unit chest radiography) on medical images are used for prediction instead of identifying true pathology. Moreover, AI has been shown to have a remarkable ability to detect protected attributes of age, sex, and race, while the same models demonstrate bias against historically underserved subgroups of age, sex, and race in disease diagnosis. Therefore, an AI model may take shortcut predictions from these correlations and subsequently generate an outcome that is biased toward certain subgroups even when protected attributes are not explicitly used as inputs into the model. As a result, these subgroups became nonprivileged subgroups. In this review, the authors discuss the various types of bias from shortcut learning that may occur at different phases of AI model development, including data bias, modeling bias, and inference bias. The authors thereafter summarize various tool kits that can be used to evaluate and mitigate bias and note that these have largely been applied to nonmedical domains and require more evaluation for medical AI. The authors then summarize current techniques for mitigating bias from preprocessing (data-centric solutions) and during model development (computational solutions) and postprocessing (recalibration of learning). Ongoing legal changes where the use of a biased model will be penalized highlight the necessity of understanding, detecting, and mitigating biases from shortcut learning and will require diverse research teams looking at the whole AI pipeline.
尽管人工智能 (AI) 模型在各种医学影像任务上表现出色,但由于不同亚组的输出结果存在差异,导致其在改善患者生活方面的实际应用受到限制。已经提出了许多公平性定义,并讨论了在选择适当的指标来评估偏差时出现的各种紧张关系;例如,应该关注个体公平还是群体公平?一个核心观察结果是,AI 模型采用了“捷径学习”,即医疗图像上的虚假特征(例如重症监护病房胸部 X 光片上的胸管和便携式放射标记物)被用于预测,而不是识别真正的病理。此外,已经表明 AI 具有显著的能力来检测年龄、性别和种族的受保护属性,而同一模型在疾病诊断中对年龄、性别和种族的历史上服务不足的亚组表现出偏见。因此,即使受保护属性未明确用作模型输入,AI 模型也可能从这些相关性中进行捷径预测,随后生成偏向某些亚组的结果。因此,这些亚组成为非特权亚组。在这篇综述中,作者讨论了可能在 AI 模型开发的不同阶段发生的各种类型的捷径学习偏差,包括数据偏差、建模偏差和推断偏差。作者随后总结了可用于评估和减轻偏差的各种工具包,并指出这些工具包主要应用于非医疗领域,需要对医疗 AI 进行更多评估。作者随后总结了用于减轻预处理(数据中心解决方案)和模型开发期间(计算解决方案)以及后处理(学习重新校准)中的偏差的当前技术。正在进行的法律变革要求使用有偏差的模型将受到惩罚,这突显了理解、检测和减轻捷径学习偏差的必要性,并将需要不同的研究团队关注整个 AI 管道。