• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

预先护理计划预测模型可靠性和公平性审计的考量因素。

Considerations in the reliability and fairness audits of predictive models for advance care planning.

作者信息

Lu Jonathan, Sattler Amelia, Wang Samantha, Khaki Ali Raza, Callahan Alison, Fleming Scott, Fong Rebecca, Ehlert Benjamin, Li Ron C, Shieh Lisa, Ramchandran Kavitha, Gensheimer Michael F, Chobot Sarah, Pfohl Stephen, Li Siyun, Shum Kenny, Parikh Nitin, Desai Priya, Seevaratnam Briththa, Hanson Melanie, Smith Margaret, Xu Yizhe, Gokhale Arjun, Lin Steven, Pfeffer Michael A, Teuteberg Winifred, Shah Nigam H

机构信息

Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States.

Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United States.

出版信息

Front Digit Health. 2022 Sep 12;4:943768. doi: 10.3389/fdgth.2022.943768. eCollection 2022.

DOI:10.3389/fdgth.2022.943768
PMID:36339512
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9634737/
Abstract

Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we conducted a reliability audit of two models based on model performance and calibration as well as a fairness audit based on summary statistics, subgroup performance and subgroup calibration. We assessed the Epic End-of-Life (EOL) Index model and an internally developed Stanford Hospital Medicine (HM) Advance Care Planning (ACP) model in 3 practice settings: Primary Care, Inpatient Oncology and Hospital Medicine, using clinicians' answers to the surprise question ("Would you be surprised if [patient X] passed away in [Y years]?") as a surrogate outcome. For performance, the models had positive predictive value (PPV) at or above 0.76 in all settings. In Hospital Medicine and Inpatient Oncology, the Stanford HM ACP model had higher sensitivity (0.69, 0.89 respectively) than the EOL model (0.20, 0.27), and better calibration (O/E 1.5, 1.7) than the EOL model (O/E 2.5, 3.0). The Epic EOL model flagged fewer patients (11%, 21% respectively) than the Stanford HM ACP model (38%, 75%). There were no differences in performance and calibration by sex. Both models had lower sensitivity in Hispanic/Latino male patients with Race listed as "Other." 10 clinicians were surveyed after a presentation summarizing the audit. 10/10 reported that summary statistics, overall performance, and subgroup performance would affect their decision to use the model to guide care; 9/10 said the same for overall and subgroup calibration. The most commonly identified barriers for routinely conducting such reliability and fairness audits were poor demographic data quality and lack of data access. This audit required 115 person-hours across 8-10 months. Our recommendations for performing reliability and fairness audits include verifying data validity, analyzing model performance on intersectional subgroups, and collecting clinician-patient linkages as necessary for label generation by clinicians. Those responsible for AI models should require such audits before model deployment and mediate between model auditors and impacted stakeholders.

摘要

针对医疗保健领域人工智能(AI)模型的多项报告指南建议,应对模型进行可靠性和公平性审核。然而,在实际操作中,进行可靠性和公平性审核的操作指南存在空白。按照指南建议,我们基于模型性能和校准对两个模型进行了可靠性审核,并基于汇总统计、亚组性能和亚组校准进行了公平性审核。我们在3种实践环境中评估了Epic临终(EOL)指数模型和一个内部开发的斯坦福医院医学(HM)的提前护理规划(ACP)模型,这3种实践环境分别是初级保健、住院肿瘤学和医院医学,我们将临床医生对意外问题(“如果[患者X]在[Y年]内去世,你会感到意外吗?”)的回答作为替代结果。在性能方面,这些模型在所有环境中的阳性预测值(PPV)均达到或高于0.76。在医院医学和住院肿瘤学环境中,斯坦福HM ACP模型的灵敏度(分别为0.69和0.89)高于EOL模型(分别为0.20和0.27),并且校准效果(观察值与预期值之比为1.5和1.7)优于EOL模型(观察值与预期值之比为2.5和3.0)。Epic EOL模型标记的患者(分别为11%和21%)少于斯坦福HM ACP模型(分别为38%和75%)。按性别划分,性能和校准方面没有差异。在种族列为“其他”的西班牙裔/拉丁裔男性患者中,两个模型的灵敏度都较低。在一次总结审核情况的报告会后,对10名临床医生进行了调查。10/10的受访者表示,汇总统计、总体性能和亚组性能会影响他们使用该模型指导护理的决定;9/10的受访者对总体和亚组校准也表示认同。常规进行此类可靠性和公平性审核最常见的障碍是人口统计数据质量差和缺乏数据访问权限。此次审核在8 - 10个月内共需要115人时。我们对进行可靠性和公平性审核的建议包括验证数据有效性、分析交叉亚组上的模型性能以及在必要时收集临床医生与患者的关联信息以便临床医生生成标签。负责AI模型的人员应在模型部署前要求进行此类审核,并在模型审核人员和受影响的利益相关者之间进行协调。

相似文献

1
Considerations in the reliability and fairness audits of predictive models for advance care planning.预先护理计划预测模型可靠性和公平性审计的考量因素。
Front Digit Health. 2022 Sep 12;4:943768. doi: 10.3389/fdgth.2022.943768. eCollection 2022.
2
Rural Hispanic/Latino cancer patients' perspectives on facilitators, barriers, and suggestions for advance care planning: A qualitative study.农村西班牙裔/拉丁裔癌症患者对促进因素、障碍以及对预先医疗指示的建议的看法:一项定性研究。
Palliat Support Care. 2022 Aug;20(4):535-541. doi: 10.1017/S1478951521001498.
3
Advance care planning in the oncology settings.肿瘤学环境中的预先医疗照护计划。
Int J Evid Based Healthc. 2013 Jun;11(2):110-4. doi: 10.1111/1744-1609.12011.
4
Intra- and inter-rater reliability of an electronic health record audit used in a chiropractic teaching clinic system: an observational study.电子健康记录审计在脊骨神经医学教学诊所用的内部和外部一致性:观察性研究。
BMC Health Serv Res. 2021 Jul 28;21(1):750. doi: 10.1186/s12913-021-06745-1.
5
A translational perspective towards clinical AI fairness.临床人工智能公平性的转化视角。
NPJ Digit Med. 2023 Sep 14;6(1):172. doi: 10.1038/s41746-023-00918-4.
6
Association of Advance Care Planning Visits With Intensity of Health Care for Medicare Beneficiaries With Serious Illness at the End of Life.预先医疗照护计划访视与末期重病医疗保险受益人医疗照护强度之关联性。
JAMA Health Forum. 2021 Jul 30;2(7):e211829. doi: 10.1001/jamahealthforum.2021.1829. eCollection 2021 Jul.
7
Normalising advance care planning in a general medicine service of a tertiary hospital: an exploratory study.在一家三级医院的普通内科服务中规范预先护理计划:一项探索性研究。
Aust Health Rev. 2016 Sep;40(4):391-398. doi: 10.1071/AH15068.
8
How Well Does the Surprise Question Predict 1-year Mortality for Patients Admitted with COPD?“惊讶问题”预测 COPD 患者 1 年死亡率的效果如何?
J Gen Intern Med. 2021 Sep;36(9):2656-2662. doi: 10.1007/s11606-020-06512-8. Epub 2021 Jan 6.
9
Current Status of Advance Care Planning and End-of-life Communication for Patients with Advanced and Metastatic Breast Cancer.晚期和转移性乳腺癌患者的预先医疗指示和临终关怀沟通的现状。
Oncologist. 2021 Apr;26(4):e686-e693. doi: 10.1002/onco.13640. Epub 2021 Jan 2.
10
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

引用本文的文献

1
Navigating Healthcare AI Governance: the Comprehensive Algorithmic Oversight and Stewardship Framework for Risk and Equity.驾驭医疗保健人工智能治理:风险与公平的综合算法监督与管理框架
Health Care Anal. 2025 Aug 13. doi: 10.1007/s10728-025-00537-y.
2
External validation of a proprietary risk model for 1-year mortality in community-dwelling adults aged 65 years or older.针对65岁及以上社区居住成年人1年死亡率的专有风险模型的外部验证。
J Am Med Inform Assoc. 2025 Jul 1;32(7):1110-1119. doi: 10.1093/jamia/ocaf062.
3
Approaches to identify scenarios for data science implementations within healthcare settings: recommendations based on experiences at multiple academic institutions.

本文引用的文献

1
Assessment of Adherence to Reporting Guidelines by Commonly Used Clinical Prediction Models From a Single Vendor: A Systematic Review.评估单一供应商常用临床预测模型报告指南的依从性:系统评价。
JAMA Netw Open. 2022 Aug 1;5(8):e2227779. doi: 10.1001/jamanetworkopen.2022.27779.
2
The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression.类别不平衡校正对风险预测模型的危害:使用逻辑回归进行说明和模拟。
J Am Med Inform Assoc. 2022 Aug 16;29(9):1525-1534. doi: 10.1093/jamia/ocac093.
3
Conceptualizing, Contextualizing, and Operationalizing Race in Quantitative Health Sciences Research.
确定医疗环境中数据科学实施场景的方法:基于多所学术机构经验的建议
Front Digit Health. 2025 Mar 14;7:1511943. doi: 10.3389/fdgth.2025.1511943. eCollection 2025.
4
Comparison of 1-year mortality predictions from vendor-supplied academic model for cancer patients.供应商提供的癌症患者学术模型对1年死亡率预测的比较。
PeerJ. 2025 Feb 11;13:e18958. doi: 10.7717/peerj.18958. eCollection 2025.
5
Developing a Research Center for Artificial Intelligence in Medicine.建立一个医学人工智能研究中心。
Mayo Clin Proc Digit Health. 2024 Dec;2(4):677-686. doi: 10.1016/j.mcpdig.2024.07.005. Epub 2024 Oct 28.
6
Mitigating Algorithmic Bias in AI-Driven Cardiovascular Imaging for Fairer Diagnostics.减轻人工智能驱动的心血管成像中的算法偏差以实现更公平的诊断。
Diagnostics (Basel). 2024 Nov 27;14(23):2675. doi: 10.3390/diagnostics14232675.
7
The promises and limitations of artificial intelligence for quality improvement, patient safety, and research in hospital medicine.人工智能在提升医院医疗质量、保障患者安全及开展医院医学研究方面的前景与局限。
J Hosp Med. 2025 Jan;20(1):85-88. doi: 10.1002/jhm.13404. Epub 2024 May 15.
8
Managing risk and resilience in autonomous and intelligent systems: Exploring safety in the development, deployment, and use of artificial intelligence in healthcare.管理自主和智能系统中的风险与恢复力:探索医疗保健领域人工智能开发、部署和使用中的安全性。
Risk Anal. 2025 Apr;45(4):910-927. doi: 10.1111/risa.14273. Epub 2024 Jan 21.
9
Stronger regulation of AI in biomedicine.加强人工智能在生物医学领域的监管。
Sci Transl Med. 2023 Sep 13;15(713):eadi0336. doi: 10.1126/scitranslmed.adi0336.
10
DEPLOYR: a technical framework for deploying custom real-time machine learning models into the electronic medical record.DEPLOYR:一个将定制的实时机器学习模型部署到电子病历中的技术框架。
J Am Med Inform Assoc. 2023 Aug 18;30(9):1532-1542. doi: 10.1093/jamia/ocad114.
将种族概念化、情境化和操作化在定量健康科学研究中。
Ann Fam Med. 2022 Mar-Apr;20(2):157-163. doi: 10.1370/afm.2792. Epub 2022 Jan 19.
4
A survey of extant organizational and computational setups for deploying predictive models in health systems.健康系统中部署预测模型的现有组织和计算架构调查。
J Am Med Inform Assoc. 2021 Oct 12;28(11):2445-2450. doi: 10.1093/jamia/ocab154.
5
External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients.在住院患者中验证广泛实施的专有脓毒症预测模型的外部有效性。
JAMA Intern Med. 2021 Aug 1;181(8):1065-1070. doi: 10.1001/jamainternmed.2021.2626.
6
Minimum sample size for external validation of a clinical prediction model with a binary outcome.具有二元结局的临床预测模型外部验证的最小样本量
Stat Med. 2021 Aug 30;40(19):4230-4251. doi: 10.1002/sim.9025. Epub 2021 May 24.
7
Comparison of Methods to Reduce Bias From Clinical Prediction Models of Postpartum Depression.比较降低产后抑郁临床预测模型偏倚的方法。
JAMA Netw Open. 2021 Apr 1;4(4):e213909. doi: 10.1001/jamanetworkopen.2021.3909.
8
Reporting of demographic data and representativeness in machine learning models using electronic health records.利用电子健康记录报告机器学习模型中的人口统计学数据和代表性。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1878-1884. doi: 10.1093/jamia/ocaa164.
9
Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension.临床试验报告报告指南涉及人工智能的干预措施:CONSORT-AI 扩展。
Nat Med. 2020 Sep;26(9):1364-1374. doi: 10.1038/s41591-020-1034-x. Epub 2020 Sep 9.
10
Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist.临床人工智能建模的最低信息要求:MI-CLAIM清单
Nat Med. 2020 Sep;26(9):1320-1324. doi: 10.1038/s41591-020-1041-y.