Darabi Negar, Hosseinichimeh Niyousha, Noto Anthony, Zand Ramin, Abedi Vida
Department of Industrial and Systems Engineering, Virginia Tech, Falls Church, VA, United States.
Geisinger Neuroscience Institute, Geisinger Health System, Danville, PA, United States.
Front Neurol. 2021 Mar 31;12:638267. doi: 10.3389/fneur.2021.638267. eCollection 2021.
Hospital readmissions impose a substantial burden on the healthcare system. Reducing readmissions after stroke could lead to improved quality of care especially since stroke is associated with a high rate of readmission. The goal of this study is to enhance our understanding of the predictors of 30-day readmission after ischemic stroke and develop models to identify high-risk individuals for targeted interventions. We used patient-level data from electronic health records (EHR), five machine learning algorithms (random forest, gradient boosting machine, extreme gradient boosting-XGBoost, support vector machine, and logistic regression-LR), data-driven feature selection strategy, and adaptive sampling to develop 15 models of 30-day readmission after ischemic stroke. We further identified important clinical variables. We included 3,184 patients with ischemic stroke (mean age: 71 ± 13.90 years, men: 51.06%). Among the 61 clinical variables included in the model, the National Institutes of Health Stroke Scale score above 24, insert indwelling urinary catheter, hypercoagulable state, and percutaneous gastrostomy had the highest importance score. The Model's AUC (area under the curve) for predicting 30-day readmission was 0.74 (95%CI: 0.64-0.78) with PPV of 0.43 when the XGBoost algorithm was used with ROSE-sampling. The balance between specificity and sensitivity improved through the sampling strategy. The best sensitivity was achieved with LR when optimized with feature selection and ROSE-sampling (AUC: 0.64, sensitivity: 0.53, specificity: 0.69). Machine learning-based models can be designed to predict 30-day readmission after stroke using structured data from EHR. Among the algorithms analyzed, XGBoost with ROSE-sampling had the best performance in terms of AUC while LR with ROSE-sampling and feature selection had the best sensitivity. Clinical variables highly associated with 30-day readmission could be targeted for personalized interventions. Depending on healthcare systems' resources and criteria, models with optimized performance metrics can be implemented to improve outcomes.
医院再入院给医疗系统带来了沉重负担。降低中风后的再入院率可改善护理质量,特别是因为中风与较高的再入院率相关。本研究的目的是加深我们对缺血性中风后30天再入院预测因素的理解,并开发模型以识别高风险个体进行有针对性的干预。我们使用了来自电子健康记录(EHR)的患者层面数据、五种机器学习算法(随机森林、梯度提升机、极端梯度提升-XGBoost、支持向量机和逻辑回归-LR)、数据驱动的特征选择策略以及自适应采样来开发15个缺血性中风后30天再入院模型。我们进一步确定了重要的临床变量。我们纳入了3184例缺血性中风患者(平均年龄:71±13.90岁,男性:51.06%)。在模型纳入的61个临床变量中,美国国立卫生研究院卒中量表评分高于24、插入留置导尿管、高凝状态和经皮胃造口术的重要性评分最高。当使用XGBoost算法结合ROSE采样时,预测30天再入院的模型的AUC(曲线下面积)为0.74(95%CI:0.64-0.78),阳性预测值为0.43。通过采样策略提高了特异性和敏感性之间的平衡。当通过特征选择和ROSE采样进行优化时,LR实现了最佳敏感性(AUC:0.64,敏感性:0.53,特异性:0.69)。基于机器学习的模型可设计为使用EHR中的结构化数据来预测中风后30天再入院。在所分析的算法中,XGBoost结合ROSE采样在AUC方面表现最佳,而LR结合ROSE采样和特征选择具有最佳敏感性。与30天再入院高度相关的临床变量可作为个性化干预的目标。根据医疗系统的资源和标准,可实施具有优化性能指标的模型以改善结果。