• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一个数据驱动的框架,用于识别人工智能/机器学习模型可能表现不佳的患者亚组。

A data-driven framework for identifying patient subgroups on which an AI/machine learning model may underperform.

作者信息

Subbaswamy Adarsh, Sahiner Berkman, Petrick Nicholas, Pai Vinay, Adams Roy, Diamond Matthew C, Saria Suchi

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.

Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, MD, USA.

出版信息

NPJ Digit Med. 2024 Nov 21;7(1):334. doi: 10.1038/s41746-024-01275-6.

DOI:10.1038/s41746-024-01275-6
PMID:39572755
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11582698/
Abstract

A fundamental goal of evaluating the performance of a clinical model is to ensure it performs well across a diverse intended patient population. A primary challenge is that the data used in model development and testing often consist of many overlapping, heterogeneous patient subgroups that may not be explicitly defined or labeled. While a model's average performance on a dataset may be high, the model can have significantly lower performance for certain subgroups, which may be hard to detect. We describe an algorithmic framework for identifying subgroups with potential performance disparities (AFISP), which produces a set of interpretable phenotypes corresponding to subgroups for which the model's performance may be relatively lower. This could allow model evaluators, including developers and users, to identify possible failure modes prior to wide-scale deployment. We illustrate the application of AFISP by applying it to a patient deterioration model to detect significant subgroup performance disparities, and show that AFISP is significantly more scalable than existing algorithmic approaches.

摘要

评估临床模型性能的一个基本目标是确保其在不同的目标患者群体中都能良好运行。一个主要挑战在于,模型开发和测试中使用的数据通常由许多重叠的、异质的患者亚组组成,这些亚组可能没有被明确界定或标记。虽然模型在数据集上的平均性能可能很高,但对于某些亚组,模型的性能可能会显著降低,而这可能很难被发现。我们描述了一种用于识别具有潜在性能差异的亚组的算法框架(AFISP),它会生成一组与模型性能可能相对较低的亚组相对应的可解释表型。这可以让包括开发者和用户在内的模型评估者在大规模部署之前识别出可能的失败模式。我们通过将AFISP应用于一个患者病情恶化模型来检测显著的亚组性能差异,从而展示AFISP的应用,并表明AFISP比现有的算法方法具有显著更高的可扩展性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/254d35f2d5dd/41746_2024_1275_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/7549528ef171/41746_2024_1275_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/da4c23d772d2/41746_2024_1275_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/bb027fcadbe6/41746_2024_1275_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/08a9b48b4429/41746_2024_1275_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/885584e0335c/41746_2024_1275_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/254d35f2d5dd/41746_2024_1275_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/7549528ef171/41746_2024_1275_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/da4c23d772d2/41746_2024_1275_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/bb027fcadbe6/41746_2024_1275_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/08a9b48b4429/41746_2024_1275_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/885584e0335c/41746_2024_1275_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8256/11582698/254d35f2d5dd/41746_2024_1275_Fig6_HTML.jpg

相似文献

1
A data-driven framework for identifying patient subgroups on which an AI/machine learning model may underperform.一个数据驱动的框架,用于识别人工智能/机器学习模型可能表现不佳的患者亚组。
NPJ Digit Med. 2024 Nov 21;7(1):334. doi: 10.1038/s41746-024-01275-6.
2
Data-driven evolution of water quality models: An in-depth investigation of innovative outlier detection approaches-A case study of Irish Water Quality Index (IEWQI) model.水质模型的数据驱动演变:创新异常值检测方法的深入研究——以爱尔兰水质指数(IEWQI)模型为例
Water Res. 2024 May 15;255:121499. doi: 10.1016/j.watres.2024.121499. Epub 2024 Mar 20.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.机器学习性能的健康公平性评估(HEAL):一个框架及皮肤病学人工智能模型案例研究
EClinicalMedicine. 2024 Mar 14;70:102479. doi: 10.1016/j.eclinm.2024.102479. eCollection 2024 Apr.
5
Leveraging code-free deep learning for pill recognition in clinical settings: A multicenter, real-world study of performance across multiple platforms.利用无代码深度学习在临床环境中进行药丸识别:在多个平台上进行的多中心真实世界性能研究。
Artif Intell Med. 2024 Apr;150:102844. doi: 10.1016/j.artmed.2024.102844. Epub 2024 Mar 13.
6
COVID-Net Biochem: an explainability-driven framework to building machine learning models for predicting survival and kidney injury of COVID-19 patients from clinical and biochemistry data.COVID-Net 生化:一个基于可解释性的框架,用于构建基于临床和生化数据预测 COVID-19 患者生存和肾脏损伤的机器学习模型。
Sci Rep. 2023 Oct 9;13(1):17001. doi: 10.1038/s41598-023-42203-0.
7
Improving Clinician Performance in Classifying EEG Patterns on the Ictal-Interictal Injury Continuum Using Interpretable Machine Learning.使用可解释的机器学习提高临床医生在发作期-发作间期损伤连续体上对脑电图模式进行分类的能力。
NEJM AI. 2024 Jun;1(6). doi: 10.1056/aioa2300331. Epub 2024 May 23.
8
Bias in medical AI: Implications for clinical decision-making.医学人工智能中的偏差:对临床决策的影响。
PLOS Digit Health. 2024 Nov 7;3(11):e0000651. doi: 10.1371/journal.pdig.0000651. eCollection 2024 Nov.
9
Framework for Integrating Equity Into Machine Learning Models: A Case Study.将公平性融入机器学习模型的框架:一个案例研究。
Chest. 2022 Jun;161(6):1621-1627. doi: 10.1016/j.chest.2022.02.001. Epub 2022 Feb 7.
10
An interpretable framework to identify responsive subgroups from clinical trials regarding treatment effects: Application to treatment of intracerebral hemorrhage.一种从关于治疗效果的临床试验中识别反应性亚组的可解释框架:在脑出血治疗中的应用。
PLOS Digit Health. 2024 May 7;3(5):e0000493. doi: 10.1371/journal.pdig.0000493. eCollection 2024 May.

引用本文的文献

1
Comparison of different AI systems for diagnosing sepsis, septic shock, and cardiogenic shock: a retrospective study.用于诊断脓毒症、脓毒性休克和心源性休克的不同人工智能系统的比较:一项回顾性研究。
Sci Rep. 2025 May 6;15(1):15850. doi: 10.1038/s41598-025-00830-9.

本文引用的文献

1
Factors Associated With Variability in the Performance of a Proprietary Sepsis Prediction Model Across 9 Networked Hospitals in the US.美国9家联网医院中,与一种专利脓毒症预测模型性能变异性相关的因素
JAMA Intern Med. 2023 Jun 1;183(6):611-612. doi: 10.1001/jamainternmed.2022.7182.
2
SliceTeller: A Data Slice-Driven Approach for Machine Learning Model Validation.SliceTeller:一种用于机器学习模型验证的基于数据切片驱动的方法。
IEEE Trans Vis Comput Graph. 2023 Jan;29(1):842-852. doi: 10.1109/TVCG.2022.3209465. Epub 2022 Dec 16.
3
Prospective, multi-site study of patient outcomes after implementation of the TREWS machine learning-based early warning system for sepsis.
采用 TREWS 机器学习为基础的脓毒症早期预警系统后,对患者预后的前瞻性、多中心研究。
Nat Med. 2022 Jul;28(7):1455-1460. doi: 10.1038/s41591-022-01894-0. Epub 2022 Jul 21.
4
Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare.临床人工智能质量改进:迈向医疗保健中人工智能算法的持续监测与更新
NPJ Digit Med. 2022 May 31;5(1):66. doi: 10.1038/s41746-022-00611-y.
5
The medical algorithmic audit.医学算法审计
Lancet Digit Health. 2022 May;4(5):e384-e397. doi: 10.1016/S2589-7500(22)00003-6. Epub 2022 Apr 5.
6
Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations.人工智能算法应用于服务不足患者人群的胸部 X 光片时的漏诊偏倚。
Nat Med. 2021 Dec;27(12):2176-2182. doi: 10.1038/s41591-021-01595-0. Epub 2021 Dec 10.
7
Improving Timeliness of Antibiotic Administration Using a Provider and Pharmacist Facing Sepsis Early Warning System in the Emergency Department Setting: A Randomized Controlled Quality Improvement Initiative.在急诊科环境中使用面向医护人员和药剂师的脓毒症早期预警系统提高抗生素使用及时性:一项随机对照质量改进计划。
Crit Care Med. 2022 Mar 1;50(3):418-427. doi: 10.1097/CCM.0000000000005267.
8
The Clinician and Dataset Shift in Artificial Intelligence.临床医生与人工智能中的数据集偏移
N Engl J Med. 2021 Jul 15;385(3):283-286. doi: 10.1056/NEJMc2104626.
9
External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients.在住院患者中验证广泛实施的专有脓毒症预测模型的外部有效性。
JAMA Intern Med. 2021 Aug 1;181(8):1065-1070. doi: 10.1001/jamainternmed.2021.2626.
10
Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging.隐藏分层导致医学成像机器学习中具有临床意义的失败。
Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:151-159. doi: 10.1145/3368555.3384468.