临床医生理解和批判性评估机器学习研究指南：使用机器学习标准工具排除偏倚的清单（ROBUST-ML）

A clinician's guide to understanding and critically appraising machine learning studies: a checklist for Ruling Out Bias Using Standard Tools in Machine Learning (ROBUST-ML).

作者信息

Al-Zaiti Salah S, Alghwiri Alaa A, Hu Xiao, Clermont Gilles, Peace Aaron, Macfarlane Peter, Bond Raymond

机构信息

Department of Acute and Tertiary Care, Department of Emergency Medicine, and Division of Cardiology, University of Pittsburgh, Pittsburgh PA, USA.

Data Science Core, The Provost Office, University of Pittsburgh, Pittsburgh PA, USA.

出版信息

Eur Heart J Digit Health. 2022 Apr 12;3(2):125-140. doi: 10.1093/ehjdh/ztac016. eCollection 2022 Jun.

DOI:10.1093/ehjdh/ztac016

PMID:36713011

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9708024/

Abstract

Developing functional machine learning (ML)-based models to address unmet clinical needs requires unique considerations for optimal clinical utility. Recent debates about the rigours, transparency, explainability, and reproducibility of ML models, terms which are defined in this article, have raised concerns about their clinical utility and suitability for integration in current evidence-based practice paradigms. This featured article focuses on increasing the literacy of ML among clinicians by providing them with the knowledge and tools needed to understand and critically appraise clinical studies focused on ML. A checklist is provided for evaluating the rigour and reproducibility of the four ML building blocks: data curation, feature engineering, model development, and clinical deployment. Checklists like this are important for quality assurance and to ensure that ML studies are rigourously and confidently reviewed by clinicians and are guided by domain knowledge of the setting in which the findings will be applied. Bridging the gap between clinicians, healthcare scientists, and ML engineers can address many shortcomings and pitfalls of ML-based solutions and their potential deployment at the bedside.

摘要

开发基于功能机器学习（ML）的模型以满足未满足的临床需求，需要为实现最佳临床效用进行独特的考量。最近关于ML模型的严谨性、透明度、可解释性和可重复性（本文对这些术语进行了定义）的争论，引发了人们对其临床效用以及是否适合整合到当前循证实践范式中的担忧。这篇专题文章聚焦于提高临床医生对ML的认知水平，为他们提供理解和批判性评估专注于ML的临床研究所需的知识和工具。本文提供了一份清单，用于评估ML四个构建模块（数据管理、特征工程、模型开发和临床应用）的严谨性和可重复性。这样的清单对于质量保证很重要，能确保临床医生严格且自信地审查ML研究，并以研究结果应用场景的领域知识为指导。弥合临床医生、医疗保健科学家和ML工程师之间的差距，可以解决基于ML的解决方案及其在床边潜在应用中的许多缺点和陷阱。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91c9/9708024/3016f106dd7b/ztac016f1.jpg

相似文献

A clinician's guide to understanding and critically appraising machine learning studies: a checklist for Ruling Out Bias Using Standard Tools in Machine Learning (ROBUST-ML).临床医生理解和批判性评估机器学习研究指南：使用机器学习标准工具排除偏倚的清单（ROBUST-ML）

Eur Heart J Digit Health. 2022 Apr 12;3(2):125-140. doi: 10.1093/ehjdh/ztac016. eCollection 2022 Jun.

A Clinician's Guide to Artificial Intelligence: How to Critically Appraise Machine Learning Studies.《临床医师人工智能指南：如何批判性地评价机器学习研究》

Transl Vis Sci Technol. 2020 Feb 12;9(2):7. doi: 10.1167/tvst.9.2.7.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Critically reading machine learning literature in neurosurgery: a reader's guide and checklist for appraising prediction models.神经外科机器学习文献批判性阅读：评估预测模型的读者指南和清单。

Neurosurg Focus. 2023 Jun;54(6):E3. doi: 10.3171/2023.3.FOCUS2352.

A Clinician's Guide to Critically Appraising Randomized Controlled Trials in the Field of Speech-Language Pathology.言语病理学领域临床医师评价随机对照试验的指南。

Am J Speech Lang Pathol. 2023 Mar 9;32(2):411-425. doi: 10.1044/2022_AJSLP-22-00180. Epub 2023 Feb 7.

Practical guide to building machine learning-based clinical prediction models using imbalanced datasets.使用不均衡数据集构建基于机器学习的临床预测模型实用指南。

Trauma Surg Acute Care Open. 2024 Jun 12;9(1):e001222. doi: 10.1136/tsaco-2023-001222. eCollection 2024.

An integration engineering framework for machine learning in healthcare.一种用于医疗保健领域机器学习的集成工程框架。

Front Digit Health. 2022 Aug 4;4:932411. doi: 10.3389/fdgth.2022.932411. eCollection 2022.

Critical Care Network in the State of Qatar.卡塔尔国重症监护网络。

Qatar Med J. 2019 Nov 7;2019(2):2. doi: 10.5339/qmj.2019.qccc.2. eCollection 2019.

Healthcare stakeholders' perceptions and experiences of factors affecting the implementation of critical care telemedicine (CCT): qualitative evidence synthesis.医疗保健利益相关者对影响重症监护远程医疗（CCT）实施因素的看法和经验：定性证据综合分析。

Cochrane Database Syst Rev. 2021 Feb 18;2(2):CD012876. doi: 10.1002/14651858.CD012876.pub2.

Supervised Machine Learning in Oncology: A Clinician's Guide.肿瘤学中的监督式机器学习：临床医生指南

Dig Dis Interv. 2020 Mar;4(1):73-81. doi: 10.1055/s-0040-1705097.

引用本文的文献

Detecting papilloedema as a marker of raised intracranial pressure using artificial intelligence: A systematic review.利用人工智能将视乳头水肿作为颅内压升高的标志物进行检测：一项系统综述。

PLOS Digit Health. 2025 Sep 2;4(9):e0000783. doi: 10.1371/journal.pdig.0000783. eCollection 2025 Sep.

Practical Models for Predicting Vaginal Intraepithelial Neoplasia in High-Grade Squamous Intraepithelial Lesions Patients within Two years After Conization.预测锥切术后两年内高级别鳞状上皮内病变患者阴道上皮内瘤变的实用模型

Int J Womens Health. 2025 Aug 13;17:2537-2549. doi: 10.2147/IJWH.S534125. eCollection 2025.

Raising awareness of potential biases in medical machine learning: Experience from a Datathon.提高对医学机器学习中潜在偏差的认识：数据马拉松的经验

PLOS Digit Health. 2025 Jul 11;4(7):e0000932. doi: 10.1371/journal.pdig.0000932. eCollection 2025 Jul.

Evaluating an alert-based multiparametric algorithm for predicting heart failure hospitalisations in patients with implantable cardioverter-defibrillators: a meta-cohort study.评估一种基于警报的多参数算法在预测植入式心脏复律除颤器患者心力衰竭住院情况中的应用：一项荟萃队列研究。

Open Heart. 2025 Jul 8;12(2):e003474. doi: 10.1136/openhrt-2025-003474.

Continuous heart rate measurements in patients with cardiac disease: Device comparison and development of a novel artefact removal procedure.心脏病患者连续心率测量：设备比较及一种新型伪差去除程序的开发

Digit Health. 2025 Jun 19;11:20552076251337598. doi: 10.1177/20552076251337598. eCollection 2025 Jan-Dec.

Using Machine learning to predict medication therapy problems among patients with chronic kidney disease.利用机器学习预测慢性肾病患者的药物治疗问题。

Am J Nephrol. 2025 Jun 17:1-16. doi: 10.1159/000546540.

Explainable artificial intelligence for stroke risk stratification in atrial fibrillation.用于心房颤动中风风险分层的可解释人工智能

Eur Heart J Digit Health. 2025 Mar 22;6(3):317-325. doi: 10.1093/ehjdh/ztaf019. eCollection 2025 May.

Machine learning allows robust classification of lung neoplasm tissue using an electronic biopsy through minimally-invasive electrical impedance spectroscopy.机器学习能够通过微创电阻抗光谱法进行电子活检，从而对肺肿瘤组织进行可靠分类。

Sci Rep. 2025 Mar 21;15(1):9716. doi: 10.1038/s41598-025-94826-0.

Artificial intelligence for the analysis of intracoronary optical coherence tomography images: a systematic review.用于分析冠状动脉内光学相干断层扫描图像的人工智能：一项系统评价

Eur Heart J Digit Health. 2025 Jan 28;6(2):270-284. doi: 10.1093/ehjdh/ztaf005. eCollection 2025 Mar.

Artificial intelligence in clinical medicine: a state-of-the-art overview of systematic reviews with methodological recommendations for improved reporting.临床医学中的人工智能：系统评价的最新综述及改进报告的方法学建议

Front Digit Health. 2025 Mar 5;7:1550731. doi: 10.3389/fdgth.2025.1550731. eCollection 2025.

本文引用的文献

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.停止为高风险决策解释黑箱机器学习模型，转而使用可解释模型。

Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.

Engaging clinicians early during the development of a graphical user display of an intelligent alerting system at the bedside.在床边智能警报系统图形用户显示界面的开发过程中，尽早让临床医生参与进来。

Int J Med Inform. 2022 Mar;159:104643. doi: 10.1016/j.ijmedinf.2021.104643. Epub 2021 Nov 11.

The role of machine learning applications in diagnosing and assessing critical and non-critical CHD: a scoping review.机器学习应用在诊断和评估危急和非危急 CHD 中的作用：范围综述。

Cardiol Young. 2021 Nov;31(11):1770-1780. doi: 10.1017/S1047951121004212. Epub 2021 Nov 2.

A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI.一种用于以人工智能为中心的诊断测试准确性研究的质量评估工具：QUADAS-AI。

Nat Med. 2021 Oct;27(10):1663-1665. doi: 10.1038/s41591-021-01517-0.

Machine learning with electrocardiograms: A call for guidelines and best practices for 'stress testing' algorithms.基于心电图的机器学习：呼吁为“压力测试”算法制定指南和最佳实践。

J Electrocardiol. 2021 Nov-Dec;69S:1-6. doi: 10.1016/j.jelectrocard.2021.07.003. Epub 2021 Jul 17.

Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence.基于人工智能的诊断和预后预测模型研究报告指南（TRIPOD-AI）和偏倚风险工具（PROBAST-AI）制定方案。

BMJ Open. 2021 Jul 9;11(7):e048008. doi: 10.1136/bmjopen-2020-048008.

Explaining deep neural networks for knowledge discovery in electrocardiogram analysis.解释在心电图分析中用于知识发现的深度神经网络。

Sci Rep. 2021 May 26;11(1):10949. doi: 10.1038/s41598-021-90285-5.

Association of Clinician Diagnostic Performance With Machine Learning-Based Decision Support Systems: A Systematic Review.临床医生诊断表现与基于机器学习的决策支持系统的关联：系统评价。

JAMA Netw Open. 2021 Mar 1;4(3):e211276. doi: 10.1001/jamanetworkopen.2021.1276.

Temporal bias in case-control design: preventing reliable predictions of the future.病例对照研究设计中的时间偏倚：阻碍对未来的可靠预测。

Nat Commun. 2021 Feb 17;12(1):1107. doi: 10.1038/s41467-021-21390-2.

Clinician checklist for assessing suitability of machine learning applications in healthcare.临床医生评估医疗保健中机器学习应用适用性的清单。

BMJ Health Care Inform. 2021 Feb;28(1). doi: 10.1136/bmjhci-2020-100251.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

临床医生理解和批判性评估机器学习研究指南：使用机器学习标准工具排除偏倚的清单（ROBUST-ML）

A clinician's guide to understanding and critically appraising machine learning studies: a checklist for Ruling Out Bias Using Standard Tools in Machine Learning (ROBUST-ML).

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献