Reed Robert A, Morgan Andrei S, Zeitlin Jennifer, Jarreau Pierre-Henri, Torchin Héloïse, Pierrat Véronique, Ancel Pierre-Yves, Khoshnood Babak
Université de Paris, Epidemiology and Statistics Research Center/CRESS, INSERM, INRA, Paris, France.
Elizabeth Garrett Anderson Institute for Womens' Health, University College London (UCL), London, United Kingdom.
Front Pediatr. 2021 Feb 3;8:585868. doi: 10.3389/fped.2020.585868. eCollection 2020.
Preterm babies are a vulnerable population that experience significant short and long-term morbidity. Rehospitalisations constitute an important, potentially modifiable adverse event in this population. Improving the ability of clinicians to identify those patients at the greatest risk of rehospitalisation has the potential to improve outcomes and reduce costs. Machine-learning algorithms can provide potentially advantageous methods of prediction compared to conventional approaches like logistic regression. To compare two machine-learning methods (least absolute shrinkage and selection operator (LASSO) and random forest) to expert-opinion driven logistic regression modelling for predicting unplanned rehospitalisation within 30 days in a large French cohort of preterm babies. This study used data derived exclusively from the population-based prospective cohort study of French preterm babies, EPIPAGE 2. Only those babies discharged home alive and whose parents completed the 1-year survey were eligible for inclusion in our study. All predictive models used a binary outcome, denoting a baby's status for an unplanned rehospitalisation within 30 days of discharge. Predictors included those quantifying clinical, treatment, maternal and socio-demographic factors. The predictive abilities of models constructed using LASSO and random forest algorithms were compared with a traditional logistic regression model. The logistic regression model comprised 10 predictors, selected by expert clinicians, while the LASSO and random forest included 75 predictors. Performance measures were derived using 10-fold cross-validation. Performance was quantified using area under the receiver operator characteristic curve, sensitivity, specificity, Tjur's coefficient of determination and calibration measures. The rate of 30-day unplanned rehospitalisation in the eligible population used to construct the models was 9.1% (95% CI 8.2-10.1) (350/3,841). The random forest model demonstrated both an improved AUROC (0.65; 95% CI 0.59-0.7; = 0.03) and specificity vs. logistic regression (AUROC 0.57; 95% CI 0.51-0.62, = 0.04). The LASSO performed similarly (AUROC 0.59; 95% CI 0.53-0.65; = 0.68) to logistic regression. Compared to an expert-specified logistic regression model, random forest offered improved prediction of 30-day unplanned rehospitalisation in preterm babies. However, all models offered relatively low levels of predictive ability, regardless of modelling method.
早产儿是一个脆弱的群体,会经历严重的短期和长期发病情况。再次住院是这一群体中一个重要的、可能可改变的不良事件。提高临床医生识别那些再次住院风险最高的患者的能力,有可能改善治疗结果并降低成本。与逻辑回归等传统方法相比,机器学习算法可以提供具有潜在优势的预测方法。为了比较两种机器学习方法(最小绝对收缩和选择算子(LASSO)和随机森林)与基于专家意见的逻辑回归模型,以预测法国一个大型早产儿队列中30天内的非计划再次住院情况。本研究使用的数据完全来自基于人群的法国早产儿前瞻性队列研究EPIPAGE 2。只有那些活着出院且父母完成了1年调查的婴儿才有资格纳入我们的研究。所有预测模型都使用二元结局,表示婴儿出院后30天内非计划再次住院的状态。预测因素包括量化临床、治疗、母亲和社会人口学因素的指标。将使用LASSO和随机森林算法构建的模型的预测能力与传统逻辑回归模型进行比较。逻辑回归模型由专家临床医生选择的10个预测因素组成,而LASSO和随机森林模型包括75个预测因素。使用10折交叉验证得出性能指标。使用受试者工作特征曲线下面积、敏感性、特异性、Tjur决定系数和校准指标对性能进行量化。用于构建模型的符合条件人群中30天非计划再次住院率为9.1%(95%CI 8.2 - 10.1)(350/3841)。随机森林模型在受试者工作特征曲线下面积(AUROC)(0.65;95%CI 0.59 - 0.7;P = 0.03)和特异性方面均优于逻辑回归(AUROC 0.57;95%CI 0.51 - 0.62,P = 0.04)。LASSO的表现与逻辑回归相似(AUROC 0.59;95%CI 0.53 - 0.65;P = 0.68)。与专家指定的逻辑回归模型相比,随机森林在预测早产儿30天非计划再次住院方面表现更优。然而,无论采用何种建模方法,所有模型的预测能力都相对较低。