Julius Centre for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands.
BMJ. 2021 Oct 20;375:n2281. doi: 10.1136/bmj.n2281.
To assess the methodological quality of studies on prediction models developed using machine learning techniques across all medical specialties.
Systematic review.
PubMed from 1 January 2018 to 31 December 2019.
Articles reporting on the development, with or without external validation, of a multivariable prediction model (diagnostic or prognostic) developed using supervised machine learning for individualised predictions. No restrictions applied for study design, data source, or predicted patient related health outcomes.
Methodological quality of the studies was determined and risk of bias evaluated using the prediction risk of bias assessment tool (PROBAST). This tool contains 21 signalling questions tailored to identify potential biases in four domains. Risk of bias was measured for each domain (participants, predictors, outcome, and analysis) and each study (overall).
152 studies were included: 58 (38%) included a diagnostic prediction model and 94 (62%) a prognostic prediction model. PROBAST was applied to 152 developed models and 19 external validations. Of these 171 analyses, 148 (87%, 95% confidence interval 81% to 91%) were rated at high risk of bias. The analysis domain was most frequently rated at high risk of bias. Of the 152 models, 85 (56%, 48% to 64%) were developed with an inadequate number of events per candidate predictor, 62 handled missing data inadequately (41%, 33% to 49%), and 59 assessed overfitting improperly (39%, 31% to 47%). Most models used appropriate data sources to develop (73%, 66% to 79%) and externally validate the machine learning based prediction models (74%, 51% to 88%). Information about blinding of outcome and blinding of predictors was, however, absent in 60 (40%, 32% to 47%) and 79 (52%, 44% to 60%) of the developed models, respectively.
Most studies on machine learning based prediction models show poor methodological quality and are at high risk of bias. Factors contributing to risk of bias include small study size, poor handling of missing data, and failure to deal with overfitting. Efforts to improve the design, conduct, reporting, and validation of such studies are necessary to boost the application of machine learning based prediction models in clinical practice.
PROSPERO CRD42019161764.
评估使用机器学习技术开发的多变量预测模型的研究的方法学质量,涵盖所有医学专业。
系统评价。
2018 年 1 月 1 日至 2019 年 12 月 31 日期间的 PubMed。
报告使用监督机器学习开发的多变量预测模型(诊断或预后)的发展情况,包括内部验证和外部验证的文章。研究设计、数据来源或预测的患者相关健康结果无限制。
使用预测风险偏倚评估工具(PROBAST)评估研究的方法学质量和偏倚风险。该工具包含 21 个信号问题,专门用于识别四个领域中潜在的偏倚。对每个领域(参与者、预测因子、结局和分析)和每个研究(总体)进行了偏倚风险评估。
纳入了 152 项研究:58 项(38%)包括诊断预测模型,94 项(62%)包括预后预测模型。PROBAST 应用于 152 个开发模型和 19 个外部验证。在这 171 项分析中,148 项(87%,95%置信区间 81%至 91%)被评为高偏倚风险。分析领域最常被评为高偏倚风险。在 152 个模型中,85 个(56%,48%至 64%)的候选预测因子每个事件的数量不足,62 个模型处理缺失数据不当(41%,33%至 49%),59 个模型评估过度拟合不当(39%,31%至 47%)。大多数模型都使用了适当的数据来源来开发(73%,66%至 79%)和外部验证基于机器学习的预测模型(74%,51%至 88%)。然而,在分别为 60 项(40%,32%至 47%)和 79 项(52%,44%至 60%)的开发模型中,缺失了关于结局和预测因子的盲法的信息。
大多数基于机器学习的预测模型研究显示出较差的方法学质量,且存在高度的偏倚风险。导致偏倚风险的因素包括研究规模小、处理缺失数据不当以及未能处理过度拟合问题。需要努力改进此类研究的设计、实施、报告和验证,以提高基于机器学习的预测模型在临床实践中的应用。
PROSPERO CRD42019161764。