Sufriyana Herdiantri, Husnayain Atina, Chen Ya-Lin, Kuo Chao-Yang, Singh Onkar, Yeh Tso-Yang, Wu Yu-Wei, Su Emily Chia-Yu
Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan.
Department of Medical Physiology, College of Medicine, University of Nahdlatul Ulama Surabaya, Surabaya, Indonesia.
JMIR Med Inform. 2020 Nov 17;8(11):e16503. doi: 10.2196/16503.
Predictions in pregnancy care are complex because of interactions among multiple factors. Hence, pregnancy outcomes are not easily predicted by a single predictor using only one algorithm or modeling method.
This study aims to review and compare the predictive performances between logistic regression (LR) and other machine learning algorithms for developing or validating a multivariable prognostic prediction model for pregnancy care to inform clinicians' decision making.
Research articles from MEDLINE, Scopus, Web of Science, and Google Scholar were reviewed following several guidelines for a prognostic prediction study, including a risk of bias (ROB) assessment. We report the results based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Studies were primarily framed as PICOTS (population, index, comparator, outcomes, timing, and setting): Population: men or women in procreative management, pregnant women, and fetuses or newborns; Index: multivariable prognostic prediction models using non-LR algorithms for risk classification to inform clinicians' decision making; Comparator: the models applying an LR; Outcomes: pregnancy-related outcomes of procreation or pregnancy outcomes for pregnant women and fetuses or newborns; Timing: pre-, inter-, and peripregnancy periods (predictors), at the pregnancy, delivery, and either puerperal or neonatal period (outcome), and either short- or long-term prognoses (time interval); and Setting: primary care or hospital. The results were synthesized by reporting study characteristics and ROBs and by random effects modeling of the difference of the logit area under the receiver operating characteristic curve of each non-LR model compared with the LR model for the same pregnancy outcomes. We also reported between-study heterogeneity by using τ and I.
Of the 2093 records, we included 142 studies for the systematic review and 62 studies for a meta-analysis. Most prediction models used LR (92/142, 64.8%) and artificial neural networks (20/142, 14.1%) among non-LR algorithms. Only 16.9% (24/142) of studies had a low ROB. A total of 2 non-LR algorithms from low ROB studies significantly outperformed LR. The first algorithm was a random forest for preterm delivery (logit AUROC 2.51, 95% CI 1.49-3.53; I=86%; τ=0.77) and pre-eclampsia (logit AUROC 1.2, 95% CI 0.72-1.67; I=75%; τ=0.09). The second algorithm was gradient boosting for cesarean section (logit AUROC 2.26, 95% CI 1.39-3.13; I=75%; τ=0.43) and gestational diabetes (logit AUROC 1.03, 95% CI 0.69-1.37; I=83%; τ=0.07).
Prediction models with the best performances across studies were not necessarily those that used LR but also used random forest and gradient boosting that also performed well. We recommend a reanalysis of existing LR models for several pregnancy outcomes by comparing them with those algorithms that apply standard guidelines.
PROSPERO (International Prospective Register of Systematic Reviews) CRD42019136106; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=136106.
由于多种因素之间的相互作用,孕期护理中的预测很复杂。因此,仅使用一种算法或建模方法的单一预测指标不易预测妊娠结局。
本研究旨在回顾和比较逻辑回归(LR)与其他机器学习算法在开发或验证用于孕期护理的多变量预后预测模型以指导临床医生决策方面的预测性能。
按照预后预测研究的若干指南,包括偏倚风险(ROB)评估,对来自MEDLINE、Scopus、科学网和谷歌学术的研究文章进行了回顾。我们根据PRISMA(系统评价和Meta分析的首选报告项目)指南报告结果。研究主要按照PICOTS(人群、指标、对照、结局、时间和环境)进行构建:人群:生殖管理中的男性或女性、孕妇以及胎儿或新生儿;指标:使用非LR算法进行风险分类以指导临床医生决策的多变量预后预测模型;对照:应用LR的模型;结局:生育相关结局或孕妇以及胎儿或新生儿的妊娠结局;时间:孕前、孕期和围孕期(预测指标)、妊娠、分娩时以及产褥期或新生儿期(结局),以及短期或长期预后(时间间隔);环境:初级保健或医院。通过报告研究特征和ROB以及对每个非LR模型与LR模型针对相同妊娠结局的受试者操作特征曲线下对数面积差异进行随机效应建模来综合结果。我们还使用τ和I报告研究间的异质性。
在2093条记录中,我们纳入了142项研究进行系统评价,62项研究进行Meta分析。在非LR算法中,大多数预测模型使用LR(92/142,64.8%)和人工神经网络(20/142,14.1%)。只有16.9%(24/142)的研究具有低ROB。来自低ROB研究的总共2种非LR算法显著优于LR。第一种算法是用于早产(对数AUROC 为2.51,95%CI 1.49 - 3.53;I = 86%;τ = 0.77)和先兆子痫(对数AUROC 为1.2,95%CI 0.72 - 1.67;I = 75%;τ = 0.09)的随机森林。第二种算法是用于剖宫产(对数AUROC 为2.26,95%CI 1.39 - 3.13;I = 75%;τ = 0.43)和妊娠期糖尿病(对数AUROC 为1.03,95%CI 0.69 - 至1.37;I = 83%;τ = 0.07)的梯度提升。
在各项研究中表现最佳的预测模型不一定是使用LR的模型,使用随机森林和梯度提升的模型也表现良好。我们建议通过将现有LR模型与应用标准指南的算法进行比较,对几种妊娠结局的现有LR模型进行重新分析。
PROSPERO(国际系统评价前瞻性注册库)CRD42019136106;https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=136106 。