Yi Paul H, Bachina Preetham, Bharti Beepul, Garin Sean P, Kanhere Adway, Kulkarni Pranav, Li David, Parekh Vishwa S, Santomartino Samantha M, Moy Linda, Sulam Jeremias
From the Department of Radiology, St Jude Children's Research Hospital, 262 Danny Thomas Pl, Memphis, TN 38105-3678 (P.H.Y.); Johns Hopkins University School of Medicine, Baltimore, Md (P.B.); Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Md (B.B., J.S.); Uniformed Services University of the Health Sciences, Bethesda, Md (S.P.G.); Institute for Health Computing, University of Maryland School of Medicine, Baltimore, Md (A.K., P.K.); Department of Medical Imaging, Western University Schulich School of Medicine & Dentistry, London, Ontario, Canada (D.L.); Department of Diagnostic and Interventional Imaging, McGovern Medical School at The University of Texas Health Science Center at Houston (UTHealth Houston), Houston, Tex (V.S.P.); Drexel University School of Medicine, Philadelphia, Pa (S.M.S.); and Department of Radiology, New York University Grossman School of Medicine, New York, NY (L.M.).
Radiology. 2025 May;315(2):e241674. doi: 10.1148/radiol.241674.
Despite growing awareness of problems with fairness in artificial intelligence (AI) models in radiology, evaluation of algorithmic biases, or AI biases, remains challenging due to various complexities. These include incomplete reporting of demographic information in medical imaging datasets, variability in definitions of demographic categories, and inconsistent statistical definitions of bias. To guide the appropriate evaluation of AI biases in radiology, this article summarizes the pitfalls in the evaluation and measurement of algorithmic biases. These pitfalls span the spectrum from the technical (eg, how different statistical definitions of bias impact conclusions about whether an AI model is biased) to those associated with social context (eg, how different conventions of race and ethnicity impact identification or masking of biases). Actionable best practices and future directions to avoid these pitfalls are summarized across three key areas: medical imaging datasets, demographic definitions, and statistical evaluations of bias. Although AI bias in radiology has been broadly reviewed in the recent literature, this article focuses specifically on underrecognized potential pitfalls related to the three key areas. By providing awareness of these pitfalls along with actionable practices to avoid them, exciting AI technologies can be used in radiology for the good of all people.
尽管人们越来越意识到放射学中人工智能(AI)模型存在公平性问题,但由于各种复杂性,对算法偏差或AI偏差的评估仍然具有挑战性。这些复杂性包括医学影像数据集中人口统计信息报告不完整、人口统计类别的定义存在差异以及偏差的统计定义不一致。为了指导对放射学中AI偏差的恰当评估,本文总结了算法偏差评估和测量中的陷阱。这些陷阱涵盖了从技术层面(例如,偏差的不同统计定义如何影响关于AI模型是否存在偏差的结论)到与社会背景相关的层面(例如,种族和民族的不同惯例如何影响偏差的识别或掩盖)。在医学影像数据集、人口统计定义和偏差的统计评估这三个关键领域总结了可采取行动的最佳实践和避免这些陷阱的未来方向。尽管放射学中的AI偏差在最近的文献中已有广泛综述,但本文特别关注与这三个关键领域相关的未被充分认识的潜在陷阱。通过提高对这些陷阱的认识并提供避免它们的可操作实践,令人兴奋的AI技术可用于放射学,造福所有人。