Cross James L, Choma Michael A, Onofrey John A
Yale School of Medicine, New Haven, Connecticut, United States of America.
Department of Radiology & Biomedical Imaging, Yale University, New Haven, Connecticut, United States of America.
PLOS Digit Health. 2024 Nov 7;3(11):e0000651. doi: 10.1371/journal.pdig.0000651. eCollection 2024 Nov.
Biases in medical artificial intelligence (AI) arise and compound throughout the AI lifecycle. These biases can have significant clinical consequences, especially in applications that involve clinical decision-making. Left unaddressed, biased medical AI can lead to substandard clinical decisions and the perpetuation and exacerbation of longstanding healthcare disparities. We discuss potential biases that can arise at different stages in the AI development pipeline and how they can affect AI algorithms and clinical decision-making. Bias can occur in data features and labels, model development and evaluation, deployment, and publication. Insufficient sample sizes for certain patient groups can result in suboptimal performance, algorithm underestimation, and clinically unmeaningful predictions. Missing patient findings can also produce biased model behavior, including capturable but nonrandomly missing data, such as diagnosis codes, and data that is not usually or not easily captured, such as social determinants of health. Expertly annotated labels used to train supervised learning models may reflect implicit cognitive biases or substandard care practices. Overreliance on performance metrics during model development may obscure bias and diminish a model's clinical utility. When applied to data outside the training cohort, model performance can deteriorate from previous validation and can do so differentially across subgroups. How end users interact with deployed solutions can introduce bias. Finally, where models are developed and published, and by whom, impacts the trajectories and priorities of future medical AI development. Solutions to mitigate bias must be implemented with care, which include the collection of large and diverse data sets, statistical debiasing methods, thorough model evaluation, emphasis on model interpretability, and standardized bias reporting and transparency requirements. Prior to real-world implementation in clinical settings, rigorous validation through clinical trials is critical to demonstrate unbiased application. Addressing biases across model development stages is crucial for ensuring all patients benefit equitably from the future of medical AI.
医学人工智能(AI)中的偏差在AI的整个生命周期中都会出现并不断累积。这些偏差可能会产生重大的临床后果,尤其是在涉及临床决策的应用中。如果不加以解决,有偏差的医学AI可能会导致临床决策不达标,以及长期存在的医疗保健差距持续存在并加剧。我们讨论了在AI开发流程的不同阶段可能出现的潜在偏差,以及它们如何影响AI算法和临床决策。偏差可能出现在数据特征和标签、模型开发与评估、部署以及发表等环节。某些患者群体的样本量不足可能导致性能欠佳、算法低估以及临床上无意义的预测。患者检查结果缺失也可能产生有偏差的模型行为,包括可获取但非随机缺失的数据,如诊断代码,以及通常不被获取或难以获取的数据,如健康的社会决定因素。用于训练监督学习模型的专业注释标签可能反映出隐性认知偏差或不规范的医疗实践。在模型开发过程中过度依赖性能指标可能会掩盖偏差并降低模型的临床效用。当应用于训练队列之外的数据时,模型性能可能会比之前的验证结果变差,并且在不同亚组中的表现可能存在差异。最终用户与已部署解决方案的交互方式可能会引入偏差。最后,模型的开发、发表地点以及开发者是谁,都会影响未来医学AI发展的轨迹和重点。减轻偏差的解决方案必须谨慎实施,这包括收集大量多样的数据集、统计去偏方法、全面的模型评估、强调模型可解释性,以及标准化的偏差报告和透明度要求。在临床环境中进行实际应用之前,通过临床试验进行严格验证对于证明无偏差应用至关重要。解决模型开发各阶段的偏差对于确保所有患者公平受益于医学AI的未来至关重要。