Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium.
Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Windmill Road, Oxford, OX3 7LD UK.
J Clin Epidemiol. 2019 Jun;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004. Epub 2019 Feb 11.
The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature.
We conducted a Medline literature search (1/2016 to 8/2017) and extracted comparisons between LR and ML models for binary outcomes.
We included 71 of 927 studies. The median sample size was 1,250 (range 72-3,994,872), with 19 predictors considered (range 5-563) and eight events per predictor (range 0.3-6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52-0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, -0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20-0.47) higher for ML.
We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms.
本研究旨在比较文献中逻辑回归(LR)和机器学习(ML)在临床预测建模中的性能。
我们进行了 Medline 文献检索(2016 年 1 月至 2017 年 8 月),并提取了二元结局中 LR 和 ML 模型的比较。
我们纳入了 927 项研究中的 71 项。中位数样本量为 1250 例(范围 72-3994872 例),考虑了 19 个预测因素(范围 5-563),每个预测因素有 8 个事件(范围 0.3-6697)。最常见的 ML 方法是分类树、随机森林、人工神经网络和支持向量机。在 48 项(68%)研究中,我们观察到验证程序中存在潜在偏倚。64 项(90%)研究使用接受者操作特征曲线下面积(AUC)评估区分度。56 项(79%)研究未解决校准问题。我们确定了 282 项 LR 和 ML 模型之间的比较(AUC 范围为 0.52-0.99)。对于低偏倚风险的 145 项比较,LR 和 ML 之间的对数 AUC 差异为 0.00(95%置信区间为-0.18 至 0.18)。对于高偏倚风险的 137 项比较,ML 的对数 AUC 高 0.34(0.20-0.47)。
我们没有发现 ML 比 LR 表现更好的证据。需要改进比较建模算法的研究方法和报告。