Zhou Wenkao, Zhao Fangli, Qiu Xingqiang, Yang Yujuan, Wang Tingting, Huang Lingyan
Department of Emergency, Xiang'an Hospital of Xiamen University, Xiamen 361100, Fujian, China.
Department of Intensive Care Unit, the Fifth Hospital of Xiamen, Xiamen 361100, Fujian, China. Zhou Wenkao is working on the Department of Medical Room, Xiang'an Hospital of Xiamen University, Xiamen 361100, Fujian, China. Corresponding author: Huang Lingyan, Email:
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2025 May;37(5):445-451. doi: 10.3760/cma.j.cn121430-20241225-01069.
To identify the optimal machine learning algorithm for predicting post-stroke epilepsy (PSE) within one year following acute stroke, establish a nomogram model based on this algorithm, and perform external validation to achieve accurate prediction of secondary epilepsy.
A total of 870 acute stroke patients admitted to the emergency department of Xiang'an Hospital of Xiamen University from June 2019 to June 2023 were enrolled for model development (model group). An external validation cohort of 435 acute stroke patients admitted to the Fifth Hospital of Xiamen during the same period was used to validate the machine learning algorithms and nomogram model. Patients were classified into control and epilepsy groups based on the development of PSE within one year. Clinical and laboratory data, including baseline characteristics, stroke location, vascular status, complications, hematologic parameters, and National Institutes of Health Stroke Scale (NIHSS) score, were collected for analysis. Nine machine learning algorithms such as logistic regression, CN2 rule induction, K-nearest neighbors, adaptive boosting, random forest, gradient boosting, support vector machine, naive Bayes, and neural network were applied to evaluate predictive performance. The area under the curve (AUC) of receiver operator characteristic curve (ROC curve) was used to identify the optimal algorithm. Logistic regression was used to screen risk factors for PSE, and the top 10 predictors were selected to construct the nomogram model. The predictive performance of the model was evaluated using the ROC curve in both the model and validation groups.
Among the 870 patients in the model group, 29 developed PSE within one year. Among the nine algorithms tested, logistic regression demonstrated the best performance and generalizability, with an AUC of 0.923. Univariate logistic regression identified several risk factors for PSE, including platelet count, white blood cell count, red blood cell count, glycated hemoglobin (HbA1c), C-reactive protein (CRP), triglycerides, high-density lipoprotein (HDL), aspartate aminotransferase (AST), alanine aminotransferase (ALT), activated partial thromboplastin time (APTT), thrombin time, D-dimer, fibrinogen, creatine kinase (CK), creatine kinase-MB (CK-MB), lactate dehydrogenase (LDH), serum sodium, lactic acid, anion gap, NIHSS score, brain herniation, periventricular stroke, and carotid artery plaque. Further multivariate logistic regression analysis showed that white blood cell count, HDL, fibrinogen, lactic acid and brain herniation were independent risk factors [odds ratio (OR) were 1.837, 198.039, 47.025, 11.559, 70.722, respectively, all P < 0.05]. In the external validation group, univariate logistic regression analysis showed that platelet count, white blood cell count, CRP, triacylglycerol, APTT, D-dimer, fibrinogen, CK, CK-MB, LDH, NIHSS score, and cerebral herniation were risk factors for PSE one year after acute stroke. Further multiple logistic regression analysis showed that APTT and cerebral herniation were independent predictors (OR were 0.587 and 116.193, respectively, both P < 0.05). The nomogram model, constructed using 10 key variables-brain herniation, periventricular stroke, carotid artery plaque, white blood cell count, triglycerides, thrombin time, D-dimer, serum sodium, lactic acid, and NIHSS score-achieved an AUC of 0.908 in the model group and 0.864 in the external validation group.
The logistic regression-based prediction model for epilepsy one year after acute stroke, developed using machine learning algorithms, showed optimal predictive performance. The nomogram model based on the logistic regression-derived predictors showed strong discriminative power and was successfully validated externally, suggesting favorable clinical applicability and generalizability.
确定预测急性卒中后1年内发生卒中后癫痫(PSE)的最佳机器学习算法,基于该算法建立列线图模型,并进行外部验证以实现对继发性癫痫的准确预测。
选取2019年6月至2023年6月在厦门大学附属翔安医院急诊科住院的870例急性卒中患者进行模型开发(模型组)。同期在厦门市第五医院住院的435例急性卒中患者作为外部验证队列,用于验证机器学习算法和列线图模型。根据1年内是否发生PSE将患者分为对照组和癫痫组。收集临床和实验室数据,包括基线特征、卒中部位、血管状况、并发症、血液学参数和美国国立卫生研究院卒中量表(NIHSS)评分,进行分析。应用逻辑回归、CN2规则归纳、K近邻、自适应增强、随机森林、梯度提升、支持向量机、朴素贝叶斯和神经网络等9种机器学习算法评估预测性能。采用受试者操作特征曲线(ROC曲线)下面积(AUC)来确定最佳算法。使用逻辑回归筛选PSE的危险因素,选择前10个预测因素构建列线图模型。在模型组和验证组中均使用ROC曲线评估模型的预测性能。
模型组870例患者中,29例在1年内发生PSE。在测试的9种算法中,逻辑回归表现出最佳性能和通用性,AUC为0.923。单因素逻辑回归确定了PSE的几个危险因素,包括血小板计数、白细胞计数、红细胞计数、糖化血红蛋白(HbA1c)、C反应蛋白(CRP)、甘油三酯、高密度脂蛋白(HDL)、天冬氨酸转氨酶(AST)、丙氨酸转氨酶(ALT)、活化部分凝血活酶时间(APTT)、凝血酶时间、D-二聚体、纤维蛋白原、肌酸激酶(CK)、肌酸激酶同工酶(CK-MB)、乳酸脱氢酶(LDH)、血清钠、乳酸、阴离子间隙、NIHSS评分、脑疝、脑室周围卒中及颈动脉斑块。进一步多因素逻辑回归分析显示,白细胞计数、HDL、纤维蛋白原、乳酸和脑疝是独立危险因素[比值比(OR)分别为1.837、198.039、47.025、11.559、70.722,均P<0.05]。在外部验证组中,单因素逻辑回归分析显示,血小板计数、白细胞计数、CRP、三酰甘油、APTT、D-二聚体、纤维蛋白原、CK、CK-MB、LDH、NIHSS评分及脑疝是急性卒中后1年PSE的危险因素。进一步多因素逻辑回归分析显示,APTT和脑疝是独立预测因素(OR分别为0.587和116.193,均P<0.05)。使用脑疝、脑室周围卒中、颈动脉斑块、白细胞计数、甘油三酯、凝血酶时间、D-二聚体,血清钠、乳酸和NIHSS评分这10个关键变量构建的列线图模型在模型组中的AUC为0.908,在外部验证组中的AUC为0.864。
使用机器学习算法开发的基于逻辑回归的急性卒中后1年癫痫预测模型表现出最佳预测性能。基于逻辑回归衍生预测因素的列线图模型具有较强的判别力,并成功进行了外部验证,表明具有良好的临床适用性和通用性。