Emergency Medicine Department of the Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu Province 221002, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin Province 130000, China.
Emergency Medicine Department of the Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu Province 221002, China; Laboratory of Emergency Medicine, Second Clinical Medical College of Xuzhou Medical University, Xuzhou, Jiangsu Province 221002, China.
Int J Med Inform. 2023 Jun;174:105050. doi: 10.1016/j.ijmedinf.2023.105050. Epub 2023 Mar 21.
Stroke is the second leading cause of death worldwide and has a significantly high recurrence rate. We aimed to identify risk factors for stroke recurrence and develop an interpretable machine learning model to predict 30-day readmissions after stroke.
Stroke patients deposited in electronic health records (EHRs) in Xuzhou Medical University Hospital between February 1, 2021, and November 30, 2021, were included in the study, and deceased patients were excluded. We extracted 74 features from EHRs, and the top 20 features (chi-2 value) were used to build machine learning models. 80% of the patients were used for pre-training. Subsequently, a 20% holdout dataset was used for verification. The Shapley Additive exPlanations (SHAP) method was used to explore the interpretability of the model.
The cohort included 6,558 patients, of whom the mean (SD) age was 65 (11) years, 3,926 were males (59.86 %), and 132 (2.01 %) were readmitted within 30 days. The area under the receiver operating characteristic curve (AUROC) for the optimized model was 0.80 (95 % CI 0.68-0.80). We used the SHAP method to identify the top 10 risk factors (i.e., severe carotid artery stenosis, weak, homocysteine, glycosylated hemoglobin, sex, lymphocyte percentage, neutrophilic granulocyte percentage, urine glucose, fresh cerebral infarction, and red blood cell count). The AUROC of a model with the 10 features was 0.80 (95 % CI 0.69-0.80) and was not significantly different from that of the model with 20 risk factors.
Our methods not only showed good performance in predicting 30-day readmissions after stroke but also revealed risk factors that provided valuable insights for treatments.
中风是全球范围内的第二大致死原因,且复发率极高。我们旨在确定中风复发的风险因素,并开发一个可解释的机器学习模型来预测中风后 30 天的再入院情况。
本研究纳入了 2021 年 2 月 1 日至 11 月 30 日期间在徐州医科大学附属医院电子健康记录(EHRs)中存储的中风患者,并排除了已死亡的患者。我们从 EHRs 中提取了 74 个特征,使用前 20 个特征(卡方值)来构建机器学习模型。80%的患者用于预训练。随后,使用 20%的保留数据集进行验证。使用 Shapley Additive exPlanations (SHAP) 方法来探索模型的可解释性。
该队列包括 6558 名患者,平均(标准差)年龄为 65(11)岁,3926 名为男性(59.86%),132 名(2.01%)在 30 天内再次入院。优化模型的接收器操作特征曲线(AUROC)下面积为 0.80(95%CI 0.68-0.80)。我们使用 SHAP 方法确定了前 10 个风险因素(即严重颈动脉狭窄、虚弱、同型半胱氨酸、糖化血红蛋白、性别、淋巴细胞百分比、中性粒细胞百分比、尿糖、新鲜脑梗死和红细胞计数)。包含 10 个特征的模型的 AUROC 为 0.80(95%CI 0.69-0.80),与包含 20 个风险因素的模型没有显著差异。
我们的方法不仅在预测中风后 30 天再入院方面表现良好,还揭示了风险因素,为治疗提供了有价值的见解。