• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能在医学应用中的评估指标。

On evaluation metrics for medical applications of artificial intelligence.

机构信息

SimulaMet, Oslo, Norway.

Oslo Metropolitan University, Oslo, Norway.

出版信息

Sci Rep. 2022 Apr 8;12(1):5979. doi: 10.1038/s41598-022-09954-8.

DOI:10.1038/s41598-022-09954-8
PMID:35395867
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8993826/
Abstract

Clinicians and software developers need to understand how proposed machine learning (ML) models could improve patient care. No single metric captures all the desirable properties of a model, which is why several metrics are typically reported to summarize a model's performance. Unfortunately, these measures are not easily understandable by many clinicians. Moreover, comparison of models across studies in an objective manner is challenging, and no tool exists to compare models using the same performance metrics. This paper looks at previous ML studies done in gastroenterology, provides an explanation of what different metrics mean in the context of binary classification in the presented studies, and gives a thorough explanation of how different metrics should be interpreted. We also release an open source web-based tool that may be used to aid in calculating the most relevant metrics presented in this paper so that other researchers and clinicians may easily incorporate them into their research.

摘要

临床医生和软件开发人员需要了解拟议的机器学习 (ML) 模型如何能够改善患者护理。没有单一的指标可以捕捉到模型的所有理想特性,这就是为什么通常会报告多个指标来总结模型的性能。不幸的是,许多临床医生很难理解这些措施。此外,以客观的方式比较研究中的模型具有挑战性,并且没有工具可用于使用相同的性能指标来比较模型。本文着眼于之前在胃肠病学中进行的 ML 研究,解释了在提出的研究中,不同指标在二进制分类背景下的含义,并详细解释了如何解释不同的指标。我们还发布了一个开源的基于网络的工具,可用于帮助计算本文中呈现的最相关指标,以便其他研究人员和临床医生可以轻松地将其纳入他们的研究中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4bd/8993826/24d9988b3f2c/41598_2022_9954_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4bd/8993826/24d9988b3f2c/41598_2022_9954_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e4bd/8993826/24d9988b3f2c/41598_2022_9954_Fig1_HTML.jpg

相似文献

1
On evaluation metrics for medical applications of artificial intelligence.人工智能在医学应用中的评估指标。
Sci Rep. 2022 Apr 8;12(1):5979. doi: 10.1038/s41598-022-09954-8.
2
MLcps: machine learning cumulative performance score for classification problems.MLcps:用于分类问题的机器学习累积性能评分。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad108. Epub 2023 Dec 13.
3
Evaluation of performance metrics for histopathological image classifier optimization.用于组织病理学图像分类器优化的性能指标评估。
Annu Int Conf IEEE Eng Med Biol Soc. 2014;2014:1933-6. doi: 10.1109/EMBC.2014.6943990.
4
A review on utilizing machine learning technology in the fields of electronic emergency triage and patient priority systems in telemedicine: Coherent taxonomy, motivations, open research challenges and recommendations for intelligent future work.利用机器学习技术在电子急诊分诊和远程医疗患者优先系统领域的应用综述:连贯的分类法、动机、开放的研究挑战和对智能未来工作的建议。
Comput Methods Programs Biomed. 2021 Sep;209:106357. doi: 10.1016/j.cmpb.2021.106357. Epub 2021 Aug 16.
5
Show Your Work: Responsible Model Reporting in Health Care Artificial Intelligence.展示你的工作:医疗人工智能中的负责任模型报告。
Surg Clin North Am. 2023 Jun;103(2S):e1-e11. doi: 10.1016/j.suc.2023.03.002. Epub 2023 May 9.
6
Explainable artificial intelligence in forensics: Realistic explanations for number of contributor predictions of DNA profiles.法医学中的可解释人工智能:对DNA图谱贡献者预测数量的现实解释。
Forensic Sci Int Genet. 2022 Jan;56:102632. doi: 10.1016/j.fsigen.2021.102632. Epub 2021 Nov 21.
7
Machine learning: an indispensable tool in bioinformatics.机器学习:生物信息学中不可或缺的工具。
Methods Mol Biol. 2010;593:25-48. doi: 10.1007/978-1-60327-194-3_2.
8
Role of Machine Learning and Artificial Intelligence in Interventional Oncology.机器学习和人工智能在介入肿瘤学中的作用。
Curr Oncol Rep. 2021 Apr 20;23(6):70. doi: 10.1007/s11912-021-01054-6.
9
Fairness in Artificial Intelligence: Regulatory Sanbox Evaluation of Bias Prevention for ECG Classification.人工智能中的公平性:用于 ECG 分类的偏见预防的监管沙盒评估。
Stud Health Technol Inform. 2023 May 18;302:488-489. doi: 10.3233/SHTI230184.
10
Basic Artificial Intelligence Techniques: Machine Learning and Deep Learning.基础人工智能技术:机器学习和深度学习。
Radiol Clin North Am. 2021 Nov;59(6):933-940. doi: 10.1016/j.rcl.2021.06.004.

引用本文的文献

1
Short-term mortality prediction in children with gastrointestinal congenital anomalies using a random forest classifier.使用随机森林分类器预测胃肠道先天性异常儿童的短期死亡率
Pediatr Res. 2025 Sep 15. doi: 10.1038/s41390-025-04378-2.
2
Evaluating artificial intelligence-enabled medical tests in cardiology: Best practice.评估心脏病学中人工智能辅助医学检测:最佳实践。
Int J Cardiol Heart Vasc. 2025 Aug 30;60:101783. doi: 10.1016/j.ijcha.2025.101783. eCollection 2025 Oct.
3
Stain-free artificial intelligence-assisted light microscopy for the identification of blood cells in microfluidic flow.

本文引用的文献

1
Multiclassification of Endoscopic Colonoscopy Images Based on Deep Transfer Learning.基于深度迁移学习的内镜结肠图像多分类。
Comput Math Methods Med. 2021 Jul 3;2021:2485934. doi: 10.1155/2021/2485934. eCollection 2021.
2
Artificial intelligence in GI endoscopy: stumbling blocks, gold standards and the role of endoscopy societies.胃肠道内镜检查中的人工智能:绊脚石、金标准及内镜学会的作用
Gut. 2022 Mar;71(3):451-454. doi: 10.1136/gutjnl-2020-323115. Epub 2021 Jan 21.
3
A Comprehensive Study on Colorectal Polyp Segmentation With ResUNet++, Conditional Random Field and Test-Time Augmentation.
用于微流控流动中血细胞识别的无标记人工智能辅助光学显微镜技术。
Front Bioinform. 2025 Aug 14;5:1628724. doi: 10.3389/fbinf.2025.1628724. eCollection 2025.
4
YOLOv8-BCD: a real-time deep learning framework for pulmonary nodule detection in computed tomography imaging.YOLOv8-BCD:一种用于计算机断层扫描成像中肺结节检测的实时深度学习框架。
Quant Imaging Med Surg. 2025 Sep 1;15(9):8189-8204. doi: 10.21037/qims-2025-824. Epub 2025 Aug 12.
5
AI and mental health: evaluating supervised machine learning models trained on diagnostic classifications.人工智能与心理健康:评估基于诊断分类训练的监督式机器学习模型
AI Soc. 2025;40(6):5077-5086. doi: 10.1007/s00146-024-02012-z. Epub 2024 Aug 2.
6
Prediction of the ectasia screening index from raw Casia2 volume data for keratoconus identification by using convolutional neural networks.利用卷积神经网络从原始Casia2体积数据预测圆锥角膜识别的扩张筛查指数。
PLoS One. 2025 Sep 2;20(9):e0311036. doi: 10.1371/journal.pone.0311036. eCollection 2025.
7
Mapping QTLs for PHS resistance and development of a deep learning model to measure PHS rate in japonica rice.粳稻抗穗发芽数量性状位点定位及深度学习模型用于测量穗发芽率的开发
Plant Genome. 2025 Sep;18(3):e70109. doi: 10.1002/tpg2.70109.
8
Development and evaluation of a convolutional neural network model for sex prediction using cephalometric radiographs and cranial photographs.使用头影测量X线片和颅骨照片的卷积神经网络模型进行性别预测的开发与评估
BMC Med Imaging. 2025 Aug 25;25(1):348. doi: 10.1186/s12880-025-01892-x.
9
Multiplex Targeted Proteomic Analysis of Cytokine Ratios for ICU Mortality in Severe COVID-19.用于重症新型冠状病毒肺炎患者重症监护病房死亡率的细胞因子比值多重靶向蛋白质组学分析
Proteomes. 2025 Aug 2;13(3):35. doi: 10.3390/proteomes13030035.
10
Development and validation of an interpretable machine learning model for retrospective identification of suspected infection for sepsis surveillance: a multicentre cohort study.用于脓毒症监测中回顾性识别疑似感染的可解释机器学习模型的开发与验证:一项多中心队列研究
EClinicalMedicine. 2025 Aug 8;87:103401. doi: 10.1016/j.eclinm.2025.103401. eCollection 2025 Sep.
基于 ResUNet++、条件随机场和测试时增强的结直肠息肉分割的综合研究。
IEEE J Biomed Health Inform. 2021 Jun;25(6):2029-2040. doi: 10.1109/JBHI.2021.3049304. Epub 2021 Jun 3.
4
Establishing key research questions for the implementation of artificial intelligence in colonoscopy: a modified Delphi method.确立人工智能在结肠镜检查中应用的关键研究问题:一项改良 Delphi 法研究。
Endoscopy. 2021 Sep;53(9):893-901. doi: 10.1055/a-1306-7590. Epub 2021 Jan 13.
5
HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy.HyperKvasir,一个用于胃肠道内镜的全面多类图像和视频数据集。
Sci Data. 2020 Aug 28;7(1):283. doi: 10.1038/s41597-020-00622-y.
6
Development of a computer-aided detection system for colonoscopy and a publicly accessible large colonoscopy video database (with video).开发一种用于结肠镜检查的计算机辅助检测系统和一个公开可用的大型结肠镜检查视频数据库(带视频)。
Gastrointest Endosc. 2021 Apr;93(4):960-967.e3. doi: 10.1016/j.gie.2020.07.060. Epub 2020 Jul 31.
7
A comparative study on polyp classification using convolutional neural networks.基于卷积神经网络的息肉分类比较研究。
PLoS One. 2020 Jul 30;15(7):e0236452. doi: 10.1371/journal.pone.0236452. eCollection 2020.
8
Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies.人工智能与临床医生:深度学习研究的设计、报告标准和主张的系统评价。
BMJ. 2020 Mar 25;368:m689. doi: 10.1136/bmj.m689.
9
New artificial intelligence system: first validation study versus experienced endoscopists for colorectal polyp detection.新型人工智能系统:与经验丰富的内镜医师在结直肠息肉检测方面的首次验证研究。
Gut. 2020 May;69(5):799-800. doi: 10.1136/gutjnl-2019-319914. Epub 2019 Oct 15.
10
Application of Artificial Intelligence to Gastroenterology and Hepatology.人工智能在胃肠病学和肝脏病学中的应用。
Gastroenterology. 2020 Jan;158(1):76-94.e2. doi: 10.1053/j.gastro.2019.08.058. Epub 2019 Oct 5.