Department of Computer Science and Information Technologies, Faculty of Computer Science, CITIC-Research Center of Information and Communication Technologies, Universidade da Coruña, A Coruña, Spain.
Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR). Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, A Coruña, Spain.
Sci Rep. 2021 May 12;11(1):10071. doi: 10.1038/s41598-021-89434-7.
We research into the clinical, biochemical and neuroimaging factors associated with the outcome of stroke patients to generate a predictive model using machine learning techniques for prediction of mortality and morbidity 3-months after admission. The dataset consisted of patients with ischemic stroke (IS) and non-traumatic intracerebral hemorrhage (ICH) admitted to Stroke Unit of a European Tertiary Hospital prospectively registered. We identified the main variables for machine learning Random Forest (RF), generating a predictive model that can estimate patient mortality/morbidity according to the following groups: (1) IS + ICH, (2) IS, and (3) ICH. A total of 6022 patients were included: 4922 (mean age 71.9 ± 13.8 years) with IS and 1100 (mean age 73.3 ± 13.1 years) with ICH. NIHSS at 24, 48 h and axillary temperature at admission were the most important variables to consider for evolution of patients at 3-months. IS + ICH group was the most stable for mortality prediction [0.904 ± 0.025 of area under the receiver operating characteristics curve (AUC)]. IS group presented similar results, although variability between experiments was slightly higher (0.909 ± 0.032 of AUC). ICH group was the one in which RF had more problems to make adequate predictions (0.9837 vs. 0.7104 of AUC). There were no major differences between IS and IS + ICH groups according to morbidity prediction (0.738 and 0.755 of AUC) but, after checking normality with a Shapiro Wilk test with the null hypothesis that the data follow a normal distribution, it was rejected with W = 0.93546 (p-value < 2.2e-16). Conditions required for a parametric test do not hold, and we performed a paired Wilcoxon Test assuming the null hypothesis that all the groups have the same performance. The null hypothesis was rejected with a value < 2.2e-16, so there are statistical differences between IS and ICH groups. In conclusion, machine learning algorithms RF can be effectively used in stroke patients for long-term outcome prediction of mortality and morbidity.
我们研究与中风患者预后相关的临床、生化和神经影像学因素,利用机器学习技术为入院后 3 个月的死亡率和发病率预测生成预测模型。该数据集由前瞻性登记在欧洲三级医院卒中病房住院的缺血性卒中(IS)和非创伤性颅内出血(ICH)患者组成。我们确定了机器学习随机森林(RF)的主要变量,生成了一个预测模型,根据以下组别估计患者的死亡率/发病率:(1)IS+ICH,(2)IS 和(3)ICH。共纳入 6022 例患者:4922 例(平均年龄 71.9±13.8 岁)为 IS,1100 例(平均年龄 73.3±13.1 岁)为 ICH。24 小时和 48 小时 NIHSS 以及入院时腋窝温度是评估患者 3 个月时病情进展的最重要变量。IS+ICH 组对死亡率预测最稳定[0.904±0.025 的受试者工作特征曲线(ROC)下面积(AUC)]。IS 组结果相似,尽管实验之间的变异性略高(0.909±0.032 的 AUC)。ICH 组是 RF 更难以做出适当预测的组(0.9837 与 AUC 为 0.7104)。根据发病率预测,IS 组和 IS+ICH 组之间没有显著差异(AUC 为 0.738 和 0.755),但在用 Shapiro-Wilk 检验对数据服从正态分布的零假设进行正态性检验后,该检验被拒绝(W=0.93546,p 值<2.2e-16)。进行参数检验所需的条件不成立,因此我们在假设所有组的性能相同的情况下,进行了配对 Wilcoxon 检验。零假设被拒绝,p 值<2.2e-16,因此 IS 组和 ICH 组之间存在统计学差异。总之,机器学习算法 RF 可有效用于中风患者,用于死亡率和发病率的长期预后预测。