通往医学机器学习中平等性能的道路。

The path toward equal performance in medical machine learning.

作者信息

Petersen Eike, Holm Sune, Ganz Melanie, Feragen Aasa

机构信息

DTU Compute, Technical University of Denmark, Richard Pedersens Plads, 2800 Kgs. Lyngby, Denmark.

Pioneer Centre for AI, Øster Voldgade 3, 1350 Copenhagen, Denmark.

出版信息

Patterns (N Y). 2023 Jul 14;4(7):100790. doi: 10.1016/j.patter.2023.100790.

DOI:10.1016/j.patter.2023.100790

PMID:37521051

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10382979/

Abstract

To ensure equitable quality of care, differences in machine learning model performance between patient groups must be addressed. Here, we argue that two separate mechanisms can cause performance differences between groups. First, model performance may be worse than theoretically achievable in a given group. This can occur due to a combination of group underrepresentation, modeling choices, and the characteristics of the prediction task at hand. We examine scenarios in which underrepresentation leads to underperformance, scenarios in which it does not, and the differences between them. Second, the optimal achievable performance may also differ between groups due to differences in the intrinsic difficulty of the prediction task. We discuss several possible causes of such differences in task difficulty. In addition, challenges such as label biases and selection biases may confound both learning and performance evaluation. We highlight consequences for the path toward equal performance, and we emphasize that leveling model performance may require gathering not only data from underperforming groups but also data. Throughout, we ground our discussion in real-world medical phenomena and case studies while also referencing relevant statistical theory.

摘要

为确保医疗服务质量的公平性，必须解决不同患者群体之间机器学习模型性能的差异问题。在此，我们认为有两种不同的机制会导致群体间的性能差异。首先，模型在给定群体中的性能可能比理论上可达到的性能更差。这可能是由于群体代表性不足、建模选择以及手头预测任务的特征等多种因素共同作用的结果。我们研究了代表性不足导致性能不佳的情况、未导致性能不佳的情况以及它们之间的差异。其次，由于预测任务的内在难度不同，不同群体的最佳可实现性能也可能不同。我们讨论了任务难度存在此类差异的几种可能原因。此外，标签偏差和选择偏差等挑战可能会混淆学习和性能评估。我们强调了实现平等性能道路上的后果，并强调平衡模型性能可能不仅需要收集表现不佳群体的数据，还需要收集其他数据。在整个讨论过程中，我们以实际的医学现象和案例研究为基础，同时也参考了相关的统计理论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0729/10382979/eb9f4021f978/gr1.jpg

相似文献

The path toward equal performance in medical machine learning.通往医学机器学习中平等性能的道路。

Patterns (N Y). 2023 Jul 14;4(7):100790. doi: 10.1016/j.patter.2023.100790.

Performance of a Computational Model of the Mammalian Olfactory System哺乳动物嗅觉系统计算模型的性能

Algorithm Recommendation and Performance Prediction Using Meta-Learning.基于元学习的算法推荐与性能预测。

Int J Neural Syst. 2023 Mar;33(3):2350011. doi: 10.1142/S0129065723500119. Epub 2023 Feb 1.

Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.基于数据驱动的血糖动力学建模与预测：机器学习在 1 型糖尿病中的应用。

Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.

Prediction and Evaluation of Machine Learning Algorithm for Prediction of Blood Transfusion during Cesarean Section and Analysis of Risk Factors of Hypothermia during Anesthesia Recovery.机器学习算法预测剖宫产术中输血的预测及麻醉恢复期低体温风险因素分析。

Comput Math Methods Med. 2022 Apr 13;2022:8661324. doi: 10.1155/2022/8661324. eCollection 2022.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Identifying Patients at Risk of Acute Kidney Injury among Patients Receiving Immune Checkpoint Inhibitors: A Machine Learning Approach.在接受免疫检查点抑制剂治疗的患者中识别急性肾损伤风险患者：一种机器学习方法。

Diagnostics (Basel). 2022 Dec 14;12(12):3157. doi: 10.3390/diagnostics12123157.

Machine Learning Can be Used to Predict Function but Not Pain After Surgery for Thumb Carpometacarpal Osteoarthritis.机器学习可用于预测拇指腕掌关节炎手术后的功能而非疼痛。

Clin Orthop Relat Res. 2022 Jul 1;480(7):1271-1284. doi: 10.1097/CORR.0000000000002105. Epub 2022 Jan 18.

Global analysis and prediction scenario of infectious outbreaks by recurrent dynamic model and machine learning models: A case study on COVID-19.基于递归动态模型和机器学习模型的传染病爆发的全球分析和预测情景：以 COVID-19 为例。

Comput Biol Med. 2023 May;158:106817. doi: 10.1016/j.compbiomed.2023.106817. Epub 2023 Mar 23.

Testing the applicability and performance of Auto ML for potential applications in diagnostic neuroradiology.测试 Auto ML 在诊断神经放射学中的潜在应用的适用性和性能。

Sci Rep. 2022 Aug 11;12(1):13648. doi: 10.1038/s41598-022-18028-8.

引用本文的文献

A scoping review and evidence gap analysis of clinical AI fairness.临床人工智能公平性的范围综述与证据差距分析

NPJ Digit Med. 2025 Jun 14;8(1):360. doi: 10.1038/s41746-025-01667-2.

Sex bias consideration in healthcare machine-learning research: a systematic review in rheumatoid arthritis.医疗保健机器学习研究中的性别偏见考量：类风湿关节炎的系统评价

BMJ Open. 2025 Mar 13;15(3):e086117. doi: 10.1136/bmjopen-2024-086117.

The Permissibility of Biased AI in a Biased World: An Ethical Analysis of AI for Screening and Referrals for Diabetic Retinopathy in Singapore.在存在偏见的世界中，有偏见的人工智能的可容许性：对新加坡糖尿病视网膜病变筛查和转诊人工智能的伦理分析

Asian Bioeth Rev. 2024 Oct 31;17(1):167-185. doi: 10.1007/s41649-024-00315-3. eCollection 2025 Jan.

FAIM: Fairness-aware interpretable modeling for trustworthy machine learning in healthcare.FAIM：用于医疗保健领域可信机器学习的公平感知可解释建模。

Patterns (N Y). 2024 Sep 12;5(10):101059. doi: 10.1016/j.patter.2024.101059. eCollection 2024 Oct 11.

An Investigation into Race Bias in Random Forest Models Based on Breast DCE-MRI Derived Radiomics Features.基于乳腺动态对比增强磁共振成像衍生的影像组学特征对随机森林模型中的种族偏见进行的调查。

Clin Image Based Proced Fairness AI Med Imaging Ethical Philos Issues Med Imaging (2023). 2023;14242:225-234. doi: 10.1007/978-3-031-45249-9_22. Epub 2023 Oct 9.

The limits of fair medical imaging AI in real-world generalization.公平的医学影像 AI 在现实世界泛化中的局限性。

Nat Med. 2024 Oct;30(10):2838-2848. doi: 10.1038/s41591-024-03113-4. Epub 2024 Jun 28.

Deep learning with noisy labels in medical prediction problems: a scoping review.深度学习中带噪标签在医学预测问题中的应用：范围综述。

J Am Med Inform Assoc. 2024 Jun 20;31(7):1596-1607. doi: 10.1093/jamia/ocae108.

Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.机器学习性能的健康公平性评估（HEAL）：一个框架及皮肤病学人工智能模型案例研究

EClinicalMedicine. 2024 Mar 14;70:102479. doi: 10.1016/j.eclinm.2024.102479. eCollection 2024 Apr.