Benhar H, Idri A, Fernández-Alemán J L
Software Project Management Research Team, ENSIAS, University Mohammed V in Rabat, Morocco.
Software Project Management Research Team, ENSIAS, University Mohammed V in Rabat, Morocco; CSEHS-MSDA, Mohammed VI Polytechnic University, Benguerir, Morocco.
Comput Methods Programs Biomed. 2020 Oct;195:105635. doi: 10.1016/j.cmpb.2020.105635. Epub 2020 Jul 3.
Early detection of heart disease is an important challenge since 17.3 million people yearly lose their lives due to heart diseases. Besides, any error in diagnosis of cardiac disease can be dangerous and risks an individual's life. Accurate diagnosis is therefore critical in cardiology. Data Mining (DM) classification techniques have been used to diagnosis heart diseases but still limited by some challenges of data quality such as inconsistencies, noise, missing data, outliers, high dimensionality and imbalanced data. Data preprocessing (DP) techniques were therefore used to prepare data with the goal of improving the performance of heart disease DM based prediction systems.
The purpose of this study is to review and summarize the current evidence on the use of preprocessing techniques in heart disease classification as regards: (1) the DP tasks and techniques most frequently used, (2) the impact of DP tasks and techniques on the performance of classification in cardiology, (3) the overall performance of classifiers when using DP techniques, and (4) comparisons of different combinations classifier-preprocessing in terms of accuracy rate.
A systematic literature review is carried out, by identifying and analyzing empirical studies on the application of data preprocessing in heart disease classification published in the period between January 2000 and June 2019. A total of 49 studies were therefore selected and analyzed according to the aforementioned criteria.
The review results show that data reduction is the most used preprocessing task in cardiology, followed by data cleaning. In general, preprocessing either maintained or improved the performance of heart disease classifiers. Some combinations such as (ANN + PCA), (ANN + CHI) and (SVM + PCA) are promising terms of accuracy. However the deployment of these models in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of interpretation.
由于每年有1730万人死于心脏病,因此心脏病的早期检测是一项重大挑战。此外,心脏病诊断中的任何错误都可能很危险,并危及个人生命。因此,准确诊断在心脏病学中至关重要。数据挖掘(DM)分类技术已被用于心脏病诊断,但仍受到数据质量的一些挑战的限制,如不一致性、噪声、缺失数据、异常值、高维度和数据不平衡。因此,使用数据预处理(DP)技术来准备数据,目的是提高基于心脏病DM的预测系统的性能。
本研究的目的是回顾和总结关于预处理技术在心脏病分类中的应用的当前证据,涉及:(1)最常用的DP任务和技术,(2)DP任务和技术对心脏病学分类性能的影响,(3)使用DP技术时分类器的整体性能,以及(4)不同分类器 - 预处理组合在准确率方面的比较。
通过识别和分析2000年1月至2019年6月期间发表的关于数据预处理在心脏病分类中的应用的实证研究,进行了系统的文献综述。因此,根据上述标准共选择并分析了49项研究。
综述结果表明,数据约简是心脏病学中最常用的预处理任务,其次是数据清理。一般来说,预处理要么维持要么提高了心脏病分类器的性能。一些组合,如(人工神经网络 + 主成分分析)、(人工神经网络 + 卡方检验)和(支持向量机 + 主成分分析)在准确率方面很有前景。然而,由于缺乏可解释性,这些模型在实际诊断决策支持系统中的部署存在若干风险和局限性。