Fang Gang, Liu Wenbin, Wang Lixin
Institute of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China.
Institute of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China.
Comput Biol Chem. 2020 Oct;88:107316. doi: 10.1016/j.compbiolchem.2020.107316. Epub 2020 Jun 23.
Ischemic stroke is a common neurological disorder, and is still the principal cause of serious long-term disability in the world. Selection of features related to stroke prognosis is highly valuable for effective intervention and treatment. In this study, an integrated machine learning approach was used to select the features as prognosis factors of stroke on The International Stroke Trial (IST) dataset. We considered the common problems of feature selection and prediction in medical datasets. Firstly, the importance of features was ranked by the Shapiro-Wilk algorithm and the Pearson correlations between features were analyzed. Then, we used Recursive Feature Elimination with Cross-Validation (RFECV), which incorporated linear SVC, Random-Forest-Classifier, Extra-Trees-Classifier, AdaBoost-Classifier, and Multinomial-Naïve-Bayes-Classifier as estimator respectively, to select robust features. Furthermore, the importance of selected features was determined by Random-Forest-Classifier and Shapiro-Wilk algorithm. Finally, twenty-three selected features were used by SVC, MLP, Random-Forest, and AdaBoost-Classifier to predict the RVISINF (Infarct visible on CT) of acute stroke on IST dataset. It was suggested that the selected features could be used to infer the long-term prognosis of acute stroke at a high accuracy, and it also could be used to extract factors related to RVISINF, which is associated with large artery occlusion (LAO) in ischemic stroke patient.
缺血性中风是一种常见的神经系统疾病,仍然是全球严重长期残疾的主要原因。选择与中风预后相关的特征对于有效的干预和治疗具有很高的价值。在本研究中,使用了一种集成机器学习方法在国际中风试验(IST)数据集上选择作为中风预后因素的特征。我们考虑了医学数据集中特征选择和预测的常见问题。首先,通过夏皮罗-威尔克算法对特征的重要性进行排序,并分析特征之间的皮尔逊相关性。然后,我们使用带交叉验证的递归特征消除(RFECV),该方法分别将线性支持向量分类器、随机森林分类器、极端随机树分类器、自适应增强分类器和多项式朴素朴素贝斯朴素贝叶斯分类器作为估计器来选择稳健的特征。此外,通过随机森林分类器和夏皮罗-威尔克算法确定所选特征的重要性。最后,使用支持向量分类器、多层感知器、随机森林和自适应增强分类器对IST数据集上急性中风的梗死灶可见于CT(RVISINF)进行预测。结果表明,所选特征可用于高精度推断急性中风的长期预后,还可用于提取与RVISINF相关的因素,RVISINF与缺血性中风患者的大动脉闭塞(LAO)有关。