Rustam Furqan, Ishaq Abid, Munir Kashif, Almutairi Mubarak, Aslam Naila, Ashraf Imran
Department of Software Engineering, School of Systems and Technology, University of Management and Technology, Lahore 54770, Pakistan.
Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan 64200, Pakistan.
Diagnostics (Basel). 2022 Jun 15;12(6):1474. doi: 10.3390/diagnostics12061474.
Cardiovascular diseases (CVDs) have been regarded as the leading cause of death with 32% of the total deaths around the world. Owing to the large number of symptoms related to age, gender, demographics, and ethnicity, diagnosing CVDs is a challenging and complex task. Furthermore, the lack of experienced staff and medical experts, and the non-availability of appropriate testing equipment put the lives of millions of people at risk, especially in under-developed and developing countries. Electronic health records (EHRs) have been utilized for diagnosing several diseases recently and show the potential for CVDs diagnosis as well. However, the accuracy and efficacy of EHRs-based CVD diagnosis are limited by the lack of an appropriate feature set. Often, the feature set is very small and unable to provide enough features for machine learning models to obtain a good fit. This study solves this problem by proposing the novel use of feature extraction from a convolutional neural network (CNN). An ensemble model is designed where a CNN model is used to enlarge the feature set to train linear models including stochastic gradient descent classifier, logistic regression, and support vector machine that comprise the soft-voting based ensemble model. Extensive experiments are performed to analyze the performance of different ratios of feature sets to the training dataset. Performance analysis is carried out using four different datasets and results are compared with recent approaches used for CVDs. Results show the superior performance of the proposed model with 0.93 accuracy, and 0.92 scores each for precision, recall, and F1 score. Results indicate both the superiority of the proposed approach, as well as the generalization of the ensemble model using multiple datasets.
心血管疾病(CVDs)被视为全球死亡的主要原因,占全球总死亡人数的32%。由于与年龄、性别、人口统计学和种族相关的症状众多,诊断心血管疾病是一项具有挑战性和复杂性的任务。此外,缺乏经验丰富的工作人员和医学专家,以及缺乏合适的检测设备,使数百万人的生命处于危险之中,特别是在不发达国家和发展中国家。电子健康记录(EHRs)最近已被用于诊断多种疾病,并且在心血管疾病诊断方面也显示出潜力。然而,基于电子健康记录的心血管疾病诊断的准确性和有效性受到缺乏合适特征集的限制。通常,特征集非常小,无法为机器学习模型提供足够的特征以获得良好的拟合。本研究通过提出从卷积神经网络(CNN)中提取特征的新用途来解决这个问题。设计了一个集成模型,其中使用CNN模型来扩大特征集,以训练包括随机梯度下降分类器、逻辑回归和支持向量机的线性模型,这些模型构成了基于软投票的集成模型。进行了广泛的实验,以分析不同特征集与训练数据集比例的性能。使用四个不同的数据集进行性能分析,并将结果与用于心血管疾病的最新方法进行比较。结果表明,所提出的模型具有卓越的性能,准确率为0.93,精确率、召回率和F1分数均为0.92。结果表明了所提出方法的优越性,以及使用多个数据集的集成模型的泛化能力。