Center of Excellence in Artificial Intelligence, Machine Learning and Smart Grid Technology, Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, 10330, Thailand.
Aging Research Center, Karolinska Institutet, 171 65, Stockholm, Sweden.
Sci Rep. 2024 Oct 4;14(1):23052. doi: 10.1038/s41598-024-73570-x.
Stroke has a negative impact on people's lives and is one of the leading causes of death and disability worldwide. Early detection of symptoms can significantly help predict stroke and promote a healthy lifestyle. Researchers have developed several methods to predict strokes using machine learning (ML) techniques. However, the proposed systems have suffered from the following two main problems. The first problem is that the machine learning models are biased due to the uneven distribution of classes in the dataset. Recent research has not adequately addressed this problem, and no preventive measures have been taken. Synthetic Minority Oversampling (SMOTE) has been used to remove bias and balance the training of the proposed ML model. The second problem is to solve the problem of lower classification accuracy of machine learning models. We proposed a learning system that combines an autoencoder with a linear discriminant analysis (LDA) model to increase the accuracy of the proposed ML model for stroke prediction. Relevant features are extracted from the feature space using the autoencoder, and the extracted subset is then fed into the LDA model for stroke classification. The hyperparameters of the LDA model are found using a grid search strategy. However, the conventional accuracy metric does not truly reflect the performance of ML models. Therefore, we employed several evaluation metrics to validate the efficiency of the proposed model. Consequently, we evaluated the proposed model's accuracy, sensitivity, specificity, area under the curve (AUC), and receiver operator characteristic (ROC). The experimental results show that the proposed model achieves a sensitivity and specificity of 98.51% and 97.56%, respectively, with an accuracy of 99.24% and a balanced accuracy of 98.00%.
中风对人们的生活有负面影响,是全球范围内导致死亡和残疾的主要原因之一。早期发现症状可以显著帮助预测中风并促进健康的生活方式。研究人员已经开发了几种使用机器学习 (ML) 技术预测中风的方法。然而,所提出的系统存在以下两个主要问题。第一个问题是由于数据集类别的不均匀分布,机器学习模型存在偏差。最近的研究没有充分解决这个问题,也没有采取任何预防措施。合成少数过采样 (SMOTE) 已被用于消除偏差并平衡所提出的 ML 模型的训练。第二个问题是解决机器学习模型分类精度较低的问题。我们提出了一个学习系统,该系统将自动编码器与线性判别分析 (LDA) 模型相结合,以提高中风预测中所提出的 ML 模型的准确性。使用自动编码器从特征空间中提取相关特征,然后将提取的子集输入到 LDA 模型中进行中风分类。使用网格搜索策略找到 LDA 模型的超参数。然而,传统的准确性指标并不能真实反映 ML 模型的性能。因此,我们采用了几种评估指标来验证所提出模型的效率。因此,我们评估了所提出模型的准确性、敏感性、特异性、曲线下面积 (AUC) 和接收者操作特征 (ROC)。实验结果表明,所提出的模型的敏感性和特异性分别为 98.51%和 97.56%,准确性为 99.24%,平衡准确性为 98.00%。