系统评价显示，机器学习在临床预测模型中并未优于逻辑回归。

A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.

机构信息

Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium.

Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Windmill Road, Oxford, OX3 7LD UK.

出版信息

J Clin Epidemiol. 2019 Jun;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004. Epub 2019 Feb 11.

DOI:10.1016/j.jclinepi.2019.02.004

PMID:30763612

Abstract

OBJECTIVES

The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature.

STUDY DESIGN AND SETTING

We conducted a Medline literature search (1/2016 to 8/2017) and extracted comparisons between LR and ML models for binary outcomes.

RESULTS

We included 71 of 927 studies. The median sample size was 1,250 (range 72-3,994,872), with 19 predictors considered (range 5-563) and eight events per predictor (range 0.3-6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52-0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, -0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20-0.47) higher for ML.

CONCLUSION

We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms.

摘要

目的

本研究旨在比较文献中逻辑回归（LR）和机器学习（ML）在临床预测建模中的性能。

研究设计和设置

我们进行了 Medline 文献检索（2016 年 1 月至 2017 年 8 月），并提取了二元结局中 LR 和 ML 模型的比较。

结果

我们纳入了 927 项研究中的 71 项。中位数样本量为 1250 例（范围 72-3994872 例），考虑了 19 个预测因素（范围 5-563），每个预测因素有 8 个事件（范围 0.3-6697）。最常见的 ML 方法是分类树、随机森林、人工神经网络和支持向量机。在 48 项（68%）研究中，我们观察到验证程序中存在潜在偏倚。64 项（90%）研究使用接受者操作特征曲线下面积（AUC）评估区分度。56 项（79%）研究未解决校准问题。我们确定了 282 项 LR 和 ML 模型之间的比较（AUC 范围为 0.52-0.99）。对于低偏倚风险的 145 项比较，LR 和 ML 之间的对数 AUC 差异为 0.00（95%置信区间为-0.18 至 0.18）。对于高偏倚风险的 137 项比较，ML 的对数 AUC 高 0.34（0.20-0.47）。