基于监督机器学习技术开发的预测模型研究中的偏倚风险：系统评价。

Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review.

机构信息

Julius Centre for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands

Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands.

出版信息

BMJ. 2021 Oct 20;375:n2281. doi: 10.1136/bmj.n2281.

DOI:10.1136/bmj.n2281

PMID:34670780

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8527348/

Abstract

OBJECTIVE

To assess the methodological quality of studies on prediction models developed using machine learning techniques across all medical specialties.

DESIGN

Systematic review.

DATA SOURCES

PubMed from 1 January 2018 to 31 December 2019.

ELIGIBILITY CRITERIA

Articles reporting on the development, with or without external validation, of a multivariable prediction model (diagnostic or prognostic) developed using supervised machine learning for individualised predictions. No restrictions applied for study design, data source, or predicted patient related health outcomes.

REVIEW METHODS

Methodological quality of the studies was determined and risk of bias evaluated using the prediction risk of bias assessment tool (PROBAST). This tool contains 21 signalling questions tailored to identify potential biases in four domains. Risk of bias was measured for each domain (participants, predictors, outcome, and analysis) and each study (overall).

RESULTS

152 studies were included: 58 (38%) included a diagnostic prediction model and 94 (62%) a prognostic prediction model. PROBAST was applied to 152 developed models and 19 external validations. Of these 171 analyses, 148 (87%, 95% confidence interval 81% to 91%) were rated at high risk of bias. The analysis domain was most frequently rated at high risk of bias. Of the 152 models, 85 (56%, 48% to 64%) were developed with an inadequate number of events per candidate predictor, 62 handled missing data inadequately (41%, 33% to 49%), and 59 assessed overfitting improperly (39%, 31% to 47%). Most models used appropriate data sources to develop (73%, 66% to 79%) and externally validate the machine learning based prediction models (74%, 51% to 88%). Information about blinding of outcome and blinding of predictors was, however, absent in 60 (40%, 32% to 47%) and 79 (52%, 44% to 60%) of the developed models, respectively.

CONCLUSION

Most studies on machine learning based prediction models show poor methodological quality and are at high risk of bias. Factors contributing to risk of bias include small study size, poor handling of missing data, and failure to deal with overfitting. Efforts to improve the design, conduct, reporting, and validation of such studies are necessary to boost the application of machine learning based prediction models in clinical practice.

SYSTEMATIC REVIEW REGISTRATION

PROSPERO CRD42019161764.

摘要

目的

评估使用机器学习技术开发的多变量预测模型的研究的方法学质量，涵盖所有医学专业。

设计

系统评价。

数据来源

2018 年 1 月 1 日至 2019 年 12 月 31 日期间的 PubMed。

纳入标准

报告使用监督机器学习开发的多变量预测模型（诊断或预后）的发展情况，包括内部验证和外部验证的文章。研究设计、数据来源或预测的患者相关健康结果无限制。

综述方法

使用预测风险偏倚评估工具（PROBAST）评估研究的方法学质量和偏倚风险。该工具包含 21 个信号问题，专门用于识别四个领域中潜在的偏倚。对每个领域（参与者、预测因子、结局和分析）和每个研究（总体）进行了偏倚风险评估。

结果

纳入了 152 项研究：58 项（38%）包括诊断预测模型，94 项（62%）包括预后预测模型。PROBAST 应用于 152 个开发模型和 19 个外部验证。在这 171 项分析中，148 项（87%，95%置信区间 81%至 91%）被评为高偏倚风险。分析领域最常被评为高偏倚风险。在 152 个模型中，85 个（56%，48%至 64%）的候选预测因子每个事件的数量不足，62 个模型处理缺失数据不当（41%，33%至 49%），59 个模型评估过度拟合不当（39%，31%至 47%）。大多数模型都使用了适当的数据来源来开发（73%，66%至 79%）和外部验证基于机器学习的预测模型（74%，51%至 88%）。然而，在分别为 60 项（40%，32%至 47%）和 79 项（52%，44%至 60%）的开发模型中，缺失了关于结局和预测因子的盲法的信息。

结论

大多数基于机器学习的预测模型研究显示出较差的方法学质量，且存在高度的偏倚风险。导致偏倚风险的因素包括研究规模小、处理缺失数据不当以及未能处理过度拟合问题。需要努力改进此类研究的设计、实施、报告和验证，以提高基于机器学习的预测模型在临床实践中的应用。

系统评价注册

PROSPERO CRD42019161764。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/27c3/8527348/3570aaa8732e/andc066195.f1.jpg

相似文献

Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review.基于监督机器学习技术开发的预测模型研究中的偏倚风险：系统评价。

BMJ. 2021 Oct 20;375:n2281. doi: 10.1136/bmj.n2281.

Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis.用于预测多发性硬化症患者临床疾病进展、恶化和活动的预后模型。

Cochrane Database Syst Rev. 2023 Sep 8;9(9):CD013606. doi: 10.1002/14651858.CD013606.pub2.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Reporting and risk of bias of prediction models based on machine learning methods in preterm birth: A systematic review.基于机器学习方法的早产预测模型的报告和偏倚风险：系统评价。

Acta Obstet Gynecol Scand. 2023 Jan;102(1):7-14. doi: 10.1111/aogs.14475. Epub 2022 Nov 17.

Protocol for a systematic review on the methodological and reporting quality of prediction model studies using machine learning techniques.使用机器学习技术的预测模型研究的方法学和报告质量的系统评价议定书。

BMJ Open. 2020 Nov 11;10(11):e038832. doi: 10.1136/bmjopen-2020-038832.

Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review.基于监督机器学习开发的临床预测模型报告的完整性：系统评价。

BMC Med Res Methodol. 2022 Jan 13;22(1):12. doi: 10.1186/s12874-021-01469-6.

Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review.基于机器学习的肿瘤预后预测模型的方法学研究：系统评价。

BMC Med Res Methodol. 2022 Apr 8;22(1):101. doi: 10.1186/s12874-022-01577-x.

Prognostic models for newly-diagnosed chronic lymphocytic leukaemia in adults: a systematic review and meta-analysis.成人新诊断慢性淋巴细胞白血病的预后模型：一项系统评价和荟萃分析。

Cochrane Database Syst Rev. 2020 Jul 31;7(7):CD012022. doi: 10.1002/14651858.CD012022.pub2.

Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models.系统评价确定了基于机器学习的预测模型研究的设计和方法实施情况。

J Clin Epidemiol. 2023 Feb;154:8-22. doi: 10.1016/j.jclinepi.2022.11.015. Epub 2022 Nov 25.

Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence.基于人工智能的诊断和预后预测模型研究报告指南（TRIPOD-AI）和偏倚风险工具（PROBAST-AI）制定方案。

BMJ Open. 2021 Jul 9;11(7):e048008. doi: 10.1136/bmjopen-2020-048008.

引用本文的文献

Type 2 Diabetes Prediction Model in China: A Five-Year Systematic Review.中国2型糖尿病预测模型：一项为期五年的系统综述。

Healthcare (Basel). 2025 Aug 15;13(16):2007. doi: 10.3390/healthcare13162007.

Diagnostic Prediction Models for Primary Care, Based on AI and Electronic Health Records: Systematic Review.基于人工智能和电子健康记录的基层医疗诊断预测模型：系统评价

JMIR Med Inform. 2025 Aug 22;13:e62862. doi: 10.2196/62862.

AI and Machine Learning Terminology in Medicine, Psychology, and Social Sciences: Tutorial and Practical Recommendations.医学、心理学和社会科学中的人工智能与机器学习术语：教程与实用建议

J Med Internet Res. 2025 Aug 18;27:e66100. doi: 10.2196/66100.

Using a large language model (ChatGPT) to assess risk of bias in randomized controlled trials of medical interventions: protocol for a pilot study of interrater agreement with human reviewers.使用大语言模型（ChatGPT）评估医学干预随机对照试验中的偏倚风险：与人类评审员进行评分者间一致性的初步研究方案

BMC Med Res Methodol. 2025 Jul 31;25(1):182. doi: 10.1186/s12874-025-02631-0.

TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods: a Korean translation.TRIPOD+AI声明：使用回归或机器学习方法的临床预测模型报告的更新指南：韩文翻译

Ewha Med J. 2025 Jul;48(3):e48. doi: 10.12771/emj.2025.00668. Epub 2025 Jul 31.

Operationalization of Artificial Intelligence Applications in the Intensive Care Unit: A Systematic Review.重症监护病房中人工智能应用的实施：一项系统综述。

JAMA Netw Open. 2025 Jul 1;8(7):e2522866. doi: 10.1001/jamanetworkopen.2025.22866.

Federated Learning-Based Model for Predicting Mortality: Systematic Review and Meta-Analysis.基于联邦学习的死亡率预测模型：系统评价与荟萃分析

J Med Internet Res. 2025 Jul 21;27:e65708. doi: 10.2196/65708.

A practical guide for nephrologist peer reviewers: evaluating artificial intelligence and machine learning research in nephrology.肾病学家同行评审员实用指南：评估肾脏病学中的人工智能和机器学习研究。

Ren Fail. 2025 Dec;47(1):2513002. doi: 10.1080/0886022X.2025.2513002. Epub 2025 Jul 7.

Methodological conduct and risk of bias in studies on prenatal birthweight prediction models using machine learning techniques: a systematic review.使用机器学习技术的产前出生体重预测模型研究中的方法学行为与偏倚风险：一项系统综述

BMC Pregnancy Childbirth. 2025 Jul 2;25(1):696. doi: 10.1186/s12884-025-07727-5.

Implementation and Updating of Clinical Prediction Models: A Systematic Review.临床预测模型的实施与更新：一项系统综述

Mayo Clin Proc Digit Health. 2025 May 23;3(3):100228. doi: 10.1016/j.mcpdig.2025.100228. eCollection 2025 Sep.

本文引用的文献

BMJ Open. 2021 Jul 9;11(7):e048008. doi: 10.1136/bmjopen-2020-048008.

Machine Learning Compared With Conventional Statistical Models for Predicting Myocardial Infarction Readmission and Mortality: A Systematic Review.机器学习与传统统计模型预测心肌梗死再入院和死亡率的比较：系统评价。

Can J Cardiol. 2021 Aug;37(8):1207-1214. doi: 10.1016/j.cjca.2021.02.020. Epub 2021 Mar 5.

A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis.深度学习在医学影像疾病检测方面的性能与医疗保健专业人员的比较：系统评价和荟萃分析。

Lancet Digit Health. 2019 Oct;1(6):e271-e297. doi: 10.1016/S2589-7500(19)30123-2. Epub 2019 Sep 25.

Machine Learning Versus Usual Care for Diagnostic and Prognostic Prediction in the Emergency Department: A Systematic Review.机器学习与常规护理在急诊科诊断和预后预测中的比较：系统评价。

Acad Emerg Med. 2021 Feb;28(2):184-196. doi: 10.1111/acem.14190. Epub 2021 Jan 2.

Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality.用于预测心力衰竭再入院和死亡率的机器学习与传统统计模型对比

ESC Heart Fail. 2021 Feb;8(1):106-115. doi: 10.1002/ehf2.13073. Epub 2020 Nov 17.

BMJ Open. 2020 Nov 11;10(11):e038832. doi: 10.1136/bmjopen-2020-038832.

Using machine-learning risk prediction models to triage the acuity of undifferentiated patients entering the emergency care system: a systematic review.使用机器学习风险预测模型对进入急诊护理系统的未分化患者的 acuity 进行分诊：一项系统综述。（注：这里“acuity”在医学语境中可能有“ acuity of illness 病情严重程度”等含义，具体需结合上下文准确理解，但按照要求不添加解释。）

Diagn Progn Res. 2020 Oct 2;4:16. doi: 10.1186/s41512-020-00084-1. eCollection 2020.

Informative missingness in electronic health record systems: the curse of knowing.电子健康记录系统中的信息性缺失：知晓之祸。

Diagn Progn Res. 2020 Jul 2;4:8. doi: 10.1186/s41512-020-00077-0. eCollection 2020.

Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal.COVID-19 诊断和预后预测模型：系统评价和批判性评估。

BMJ. 2020 Apr 7;369:m1328. doi: 10.1136/bmj.m1328.

Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies.人工智能与临床医生：深度学习研究的设计、报告标准和主张的系统评价。

BMJ. 2020 Mar 25;368:m689. doi: 10.1136/bmj.m689.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于监督机器学习技术开发的预测模型研究中的偏倚风险：系统评价。

Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review.

机构信息

出版信息

OBJECTIVE

DESIGN

DATA SOURCES

ELIGIBILITY CRITERIA

REVIEW METHODS

RESULTS

CONCLUSION

SYSTEMATIC REVIEW REGISTRATION

目的

设计

数据来源

纳入标准

综述方法

结果

结论

系统评价注册

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献