• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

医疗保健中临床人工智能性能监测:一项范围综述

Monitoring performance of clinical artificial intelligence in health care: a scoping review.

作者信息

Andersen Eline Sandvig, Birk-Korch Johan Baden, Hansen Rasmus Søgaard, Fly Line Haugaard, Röttger Richard, Arcani Diana Maria Cespedes, Brasen Claus Lohman, Brandslund Ivan, Madsen Jonna Skov

机构信息

Department of Biochemistry and Immunology, Lillebaelt Hospital - University Hospital of Southern Denmark, Vejle, Denmark.

Department of Regional Health Research, University of Southern Denmark, Lillebælt Hospital (Kolding and Vejle), Denmark.

出版信息

JBI Evid Synth. 2024 Dec 1;22(12):2423-2446. doi: 10.11124/JBIES-24-00042.

DOI:10.11124/JBIES-24-00042
PMID:39658865
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11630661/
Abstract

OBJECTIVE

The objective of this review was to provide an overview of the diverse methods described, tested, or implemented for monitoring performance of clinical artificial intelligence (AI) systems, while also summarizing the arguments given for or against these methods.

INTRODUCTION

The integration of AI in clinical decision-making is steadily growing. Performances of AI systems evolve over time, necessitating ongoing performance monitoring. However, the evidence on specific monitoring methods is sparse and heterogeneous. Thus, an overview of the evidence on this topic is warranted to guide further research on clinical AI monitoring.

INCLUSION CRITERIA

We included publications detailing metrics or statistical processes employed in systematic, continuous, or repeated initiatives aimed at evaluating or predicting the clinical performance of AI models with direct implications for patient management in health care. No limitations on language or publication date were enforced.

METHODS

We performed systematic database searches in MEDLINE (Ovid), Embase (Ovid), Scopus, and ProQuest Dissertations and Theses Global, supplemented by backward and forward citation searches and gray literature searches. Two or more independent reviewers conducted title and abstract screening, full-text evaluation, and data extraction using a tool developed by the authors. During extraction, the methods identified were divided into subcategories. The results are presented narratively and summarized in tables and graphs.

RESULTS

Thirty-nine sources of evidence were included in the review, with the most abundant source types being opinion papers/narrative reviews (33%) and simulation studies (33%). One guideline on the topic was identified, offering limited guidance on specific metrics and statistical methods. The number of sources included increased year by year, with almost 4 times as many sources included in 2023 compared with 2019. The most commonly reported performance metrics were traditional metrics from the medical literature, including area under the receiver operating characteristics curve (AUROC), sensitivity, specificity, and predictive values, although few arguments were given supporting these choices. Some studies reported on metrics and statistical processing specifically designed to monitor clinical AI.

CONCLUSION

This review provides a summary of the methods described for monitoring AI in health care. It reveals a relative scarcity of evidence and guidance for specific practical implementation of performance monitoring of clinical AI. This underscores the imperative for further research, discussion, and guidance regarding the specifics of implementing monitoring for clinical AI. The steady increase in the number of relevant sources published per year suggests that this area of research is gaining increased focus, and the amount of evidence and guidance available will likely increase significantly over the coming years.

REVIEW REGISTRATION

Open Science Framework https://osf.io/afkrn.

摘要

目的

本综述的目的是概述已描述、测试或实施的用于监测临床人工智能(AI)系统性能的各种方法,同时总结支持或反对这些方法的论据。

引言

AI在临床决策中的整合正在稳步发展。AI系统的性能会随时间演变,因此需要持续进行性能监测。然而,关于具体监测方法的证据稀少且参差不齐。因此,有必要对该主题的证据进行概述,以指导临床AI监测的进一步研究。

纳入标准

我们纳入了详细介绍在系统、持续或重复的举措中所采用的指标或统计过程的出版物,这些举措旨在评估或预测对医疗保健中患者管理有直接影响的AI模型的临床性能。未对语言或出版日期加以限制。

方法

我们在MEDLINE(Ovid)、Embase(Ovid)、Scopus和ProQuest Dissertations and Theses Global中进行了系统的数据库检索,并辅以向后和向前的引文检索以及灰色文献检索。两名或更多独立评审员使用作者开发的工具进行标题和摘要筛选、全文评估以及数据提取。在提取过程中,所确定的方法被分为子类别。结果以叙述形式呈现,并汇总在表格和图表中。

结果

本综述纳入了39个证据来源,其中最丰富的来源类型是观点论文/叙述性综述(33%)和模拟研究(33%)。确定了一项关于该主题的指南,该指南对具体指标和统计方法的指导有限。纳入的来源数量逐年增加,2023年纳入的来源数量几乎是2019年的4倍。最常报告的性能指标是医学文献中的传统指标,包括受试者操作特征曲线下面积(AUROC)、敏感性、特异性和预测值,不过支持这些选择的论据很少。一些研究报告了专门设计用于监测临床AI的指标和统计处理方法。

结论

本综述总结了所描述的用于医疗保健中监测AI的方法。它揭示了在临床AI性能监测的具体实际实施方面,证据和指导相对匮乏。这凸显了就临床AI监测实施细节进行进一步研究、讨论和指导的紧迫性。每年发表的相关来源数量稳步增加,表明该研究领域正受到越来越多的关注,未来几年可用的证据和指导数量可能会大幅增加。

综述注册

开放科学框架https://osf.io/afkrn。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b3f/11630661/fa9945505391/srx-22-2423-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b3f/11630661/5eaf0a4826a2/srx-22-2423-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b3f/11630661/fa9945505391/srx-22-2423-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b3f/11630661/5eaf0a4826a2/srx-22-2423-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b3f/11630661/fa9945505391/srx-22-2423-g002.jpg

相似文献

1
Monitoring performance of clinical artificial intelligence in health care: a scoping review.医疗保健中临床人工智能性能监测:一项范围综述
JBI Evid Synth. 2024 Dec 1;22(12):2423-2446. doi: 10.11124/JBIES-24-00042.
2
Monitoring performance of clinical artificial intelligence: a scoping review protocol.监测临床人工智能的性能:范围综述方案。
JBI Evid Synth. 2024 Mar 1;22(3):453-460. doi: 10.11124/JBIES-23-00390.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.超越黑木树:影响澳大利亚地区、农村和偏远地区的健康研究问题的快速综述。
Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881.
5
Application of Artificial Intelligence in Community-Based Primary Health Care: Systematic Scoping Review and Critical Appraisal.人工智能在社区基层医疗中的应用:系统范围综述和批判性评估。
J Med Internet Res. 2021 Sep 3;23(9):e29839. doi: 10.2196/29839.
6
Artificial intelligence applied in human health technology assessment: a scoping review protocol.应用于人类健康技术评估的人工智能:一项范围综述方案
JBI Evid Synth. 2024 Dec 1;22(12):2559-2566. doi: 10.11124/JBIES-23-00377.
7
Challenges and Facilitation Approaches for the Participatory Design of AI-Based Clinical Decision Support Systems: Protocol for a Scoping Review.基于人工智能的临床决策支持系统参与式设计的挑战和促进方法:系统评价方案。
JMIR Res Protoc. 2024 Sep 5;13:e58185. doi: 10.2196/58185.
8
Exploring Curriculum Considerations to Prepare Future Radiographers for an AI-Assisted Health Care Environment: Protocol for Scoping Review.探索课程考量,为未来放射技师适应人工智能辅助医疗环境做好准备:范围综述方案
JMIR Res Protoc. 2025 Mar 6;14:e60431. doi: 10.2196/60431.
9
Health Care Social Robots in the Age of Generative AI: Protocol for a Scoping Review.生成式人工智能时代的医疗保健社交机器人:一项范围综述的方案
JMIR Res Protoc. 2025 Apr 14;14:e63017. doi: 10.2196/63017.
10
AI for IMPACTS Framework for Evaluating the Long-Term Real-World Impacts of AI-Powered Clinician Tools: Systematic Review and Narrative Synthesis.用于评估人工智能驱动的临床医生工具长期现实世界影响的AI for IMPACTS框架:系统评价与叙述性综合分析
J Med Internet Res. 2025 Feb 5;27:e67485. doi: 10.2196/67485.

引用本文的文献

1
Perspectives of family medicine residents on artificial intelligence for survival estimation in patients with serious illness.家庭医学住院医师对人工智能用于危重病患者生存预估的看法。
PLOS Digit Health. 2025 Jul 1;4(7):e0000917. doi: 10.1371/journal.pdig.0000917. eCollection 2025 Jul.
2
Societal factors influencing the implementation of AI-driven technologies in (smart) hospitals.影响(智能)医院中人工智能驱动技术实施的社会因素。
PLoS One. 2025 Jun 12;20(6):e0325718. doi: 10.1371/journal.pone.0325718. eCollection 2025.
3
Enhancing professional communication training in higher education through artificial intelligence(AI)-integrated exercises: study protocol for a randomised controlled trial.

本文引用的文献

1
Monitoring performance of clinical artificial intelligence: a scoping review protocol.监测临床人工智能的性能:范围综述方案。
JBI Evid Synth. 2024 Mar 1;22(3):453-460. doi: 10.11124/JBIES-23-00390.
2
A geometry and dose-volume based performance monitoring of artificial intelligence models in radiotherapy treatment planning for prostate cancer.基于几何和剂量体积的人工智能模型在前列腺癌放射治疗计划中的性能监测。
Phys Imaging Radiat Oncol. 2023 Sep 23;28:100494. doi: 10.1016/j.phro.2023.100494. eCollection 2023 Oct.
3
Detecting changes in the performance of a clinical machine learning tool over time.
通过人工智能(AI)集成练习加强高等教育中的专业沟通培训:一项随机对照试验的研究方案
BMC Med Educ. 2025 May 30;25(1):804. doi: 10.1186/s12909-025-07307-3.
4
Five steps for the deployment of artificial intelligence-driven healthcare delivery for remote and indigenous populations in Canada.加拿大为偏远和原住民人口部署人工智能驱动的医疗服务的五个步骤。
Digit Health. 2025 Apr 13;11:20552076251334422. doi: 10.1177/20552076251334422. eCollection 2025 Jan-Dec.
5
How Do Radiologists Currently Monitor AI in Radiology and What Challenges Do They Face? An Interview Study and Qualitative Analysis.放射科医生目前如何监测放射学中的人工智能,他们面临哪些挑战?一项访谈研究和定性分析。
J Imaging Inform Med. 2025 Apr 8. doi: 10.1007/s10278-025-01493-8.
检测临床机器学习工具性能随时间的变化。
EBioMedicine. 2023 Nov;97:104823. doi: 10.1016/j.ebiom.2023.104823. Epub 2023 Oct 2.
4
Approaches to Sampling for Quality Control of Artificial Intelligence in Biomedical Research.人工智能在生物医学研究中的质量控制的采样方法。
Sovrem Tekhnologii Med. 2023;15(2):19-25. doi: 10.17691/stm2023.15.2.02. Epub 2023 Mar 29.
5
DEPLOYR: a technical framework for deploying custom real-time machine learning models into the electronic medical record.DEPLOYR:一个将定制的实时机器学习模型部署到电子病历中的技术框架。
J Am Med Inform Assoc. 2023 Aug 18;30(9):1532-1542. doi: 10.1093/jamia/ocad114.
6
Ensuring fair, safe, and interpretable artificial intelligence-based prediction tools in a real-world oncological setting.在现实世界的肿瘤学环境中确保基于人工智能的预测工具的公平性、安全性和可解释性。
Commun Med (Lond). 2023 Jun 22;3(1):88. doi: 10.1038/s43856-023-00317-6.
7
Methodology for Conducting Post-Marketing Surveillance of Software as a Medical Device Based on Artificial Intelligence Technologies.基于人工智能技术的医疗器械软件上市后监测方法学。
Sovrem Tekhnologii Med. 2022;14(5):15-23. doi: 10.17691/stm2022.14.5.02. Epub 2022 Sep 29.
8
Integration and evaluation of chest X-ray artificial intelligence in clinical practice.胸部X光人工智能在临床实践中的整合与评估
J Med Imaging (Bellingham). 2023 Sep;10(5):051805. doi: 10.1117/1.JMI.10.5.051805. Epub 2023 Apr 25.
9
Artificial Intelligence in Radiology: A Private Practice Perspective From a Large Health System in Latin America.放射学中的人工智能:来自拉丁美洲大型医疗体系的私人执业视角。
Semin Roentgenol. 2023 Apr;58(2):203-207. doi: 10.1053/j.ro.2023.01.006. Epub 2023 Feb 23.
10
Data drift in medical machine learning: implications and potential remedies.医学机器学习中的数据漂移:影响和潜在的补救措施。
Br J Radiol. 2023 Oct;96(1150):20220878. doi: 10.1259/bjr.20220878. Epub 2023 Mar 27.