Department of Computer Science & Information Technology, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, India.
Information Systems Department, Prince Sultan University, Riyadh, Saudi Arabia.
Comput Math Methods Med. 2021 Dec 20;2021:8500314. doi: 10.1155/2021/8500314. eCollection 2021.
Cardiovascular disease (CVD) is one of the most common causes of death that kills approximately 17 million people annually. The main reasons behind CVD are myocardial infarction and the failure of the heart to pump blood normally. Doctors could diagnose heart failure (HF) through electronic medical records on the basis of patient's symptoms and clinical laboratory investigations. However, accurate diagnosis of HF requires medical resources and expert practitioners that are not always available, thus making the diagnosing challengeable. Therefore, predicting the patients' condition by using machine learning algorithms is a necessity to save time and efforts. This paper proposed a machine-learning-based approach that distinguishes the most important correlated features amongst patients' electronic clinical records. The SelectKBest function was applied with chi-squared statistical method to determine the most important features, and then feature engineering method has been applied to create new features correlated strongly in order to train machine learning models and obtain promising results. Optimised hyperparameter classification algorithms SVM, KNN, Decision Tree, Random Forest, and Logistic Regression were used to train two different datasets. The first dataset, called Cleveland, consisted of 303 records. The second dataset, which was used for predicting HF, consisted of 299 records. Experimental results showed that the Random Forest algorithm achieved accuracy, precision, recall, and F1 scores of 95%, 97.62%, 95.35%, and 96.47%, respectively, during the test phase for the second dataset. The same algorithm achieved accuracy scores of 100% for the first dataset and 97.68% for the second dataset, while 100% precision, recall, and F1 scores were reached for both datasets.
心血管疾病(CVD)是导致每年约 1700 万人死亡的最常见死因之一。CVD 的主要原因是心肌梗死和心脏不能正常泵血。医生可以根据患者的症状和临床实验室检查结果从电子病历中诊断心力衰竭(HF)。然而,HF 的准确诊断需要医疗资源和专家医生,这些资源并不总是可用的,因此诊断具有挑战性。因此,使用机器学习算法预测患者的病情是必要的,可以节省时间和精力。本文提出了一种基于机器学习的方法,可以区分患者电子临床记录中的最重要相关特征。应用 SelectKBest 函数和卡方统计方法来确定最重要的特征,然后应用特征工程方法创建与重要特征强相关的新特征,以便训练机器学习模型并获得有前途的结果。优化的超参数分类算法 SVM、KNN、决策树、随机森林和逻辑回归用于训练两个不同的数据集。第一个数据集称为克利夫兰,包含 303 条记录。第二个数据集用于预测 HF,包含 299 条记录。实验结果表明,在第二个数据集的测试阶段,随机森林算法的准确率、精度、召回率和 F1 分数分别为 95%、97.62%、95.35%和 96.47%。同一算法在第一个数据集的准确率为 100%,在第二个数据集的准确率为 97.68%,而两个数据集的精度、召回率和 F1 分数均达到 100%。