用于心脏病分类的数据预处理：一项系统的文献综述。

Data preprocessing for heart disease classification: A systematic literature review.

作者信息

Benhar H, Idri A, Fernández-Alemán J L

机构信息

Software Project Management Research Team, ENSIAS, University Mohammed V in Rabat, Morocco.

Software Project Management Research Team, ENSIAS, University Mohammed V in Rabat, Morocco; CSEHS-MSDA, Mohammed VI Polytechnic University, Benguerir, Morocco.

出版信息

Comput Methods Programs Biomed. 2020 Oct;195:105635. doi: 10.1016/j.cmpb.2020.105635. Epub 2020 Jul 3.

DOI:10.1016/j.cmpb.2020.105635

PMID:32652383

Abstract

CONTEXT

Early detection of heart disease is an important challenge since 17.3 million people yearly lose their lives due to heart diseases. Besides, any error in diagnosis of cardiac disease can be dangerous and risks an individual's life. Accurate diagnosis is therefore critical in cardiology. Data Mining (DM) classification techniques have been used to diagnosis heart diseases but still limited by some challenges of data quality such as inconsistencies, noise, missing data, outliers, high dimensionality and imbalanced data. Data preprocessing (DP) techniques were therefore used to prepare data with the goal of improving the performance of heart disease DM based prediction systems.

OBJECTIVE

The purpose of this study is to review and summarize the current evidence on the use of preprocessing techniques in heart disease classification as regards: (1) the DP tasks and techniques most frequently used, (2) the impact of DP tasks and techniques on the performance of classification in cardiology, (3) the overall performance of classifiers when using DP techniques, and (4) comparisons of different combinations classifier-preprocessing in terms of accuracy rate.

METHOD

A systematic literature review is carried out, by identifying and analyzing empirical studies on the application of data preprocessing in heart disease classification published in the period between January 2000 and June 2019. A total of 49 studies were therefore selected and analyzed according to the aforementioned criteria.

RESULTS

The review results show that data reduction is the most used preprocessing task in cardiology, followed by data cleaning. In general, preprocessing either maintained or improved the performance of heart disease classifiers. Some combinations such as (ANN + PCA), (ANN + CHI) and (SVM + PCA) are promising terms of accuracy. However the deployment of these models in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of interpretation.

摘要

背景

由于每年有1730万人死于心脏病，因此心脏病的早期检测是一项重大挑战。此外，心脏病诊断中的任何错误都可能很危险，并危及个人生命。因此，准确诊断在心脏病学中至关重要。数据挖掘（DM）分类技术已被用于心脏病诊断，但仍受到数据质量的一些挑战的限制，如不一致性、噪声、缺失数据、异常值、高维度和数据不平衡。因此，使用数据预处理（DP）技术来准备数据，目的是提高基于心脏病DM的预测系统的性能。

目的

本研究的目的是回顾和总结关于预处理技术在心脏病分类中的应用的当前证据，涉及：（1）最常用的DP任务和技术，（2）DP任务和技术对心脏病学分类性能的影响，（3）使用DP技术时分类器的整体性能，以及（4）不同分类器 - 预处理组合在准确率方面的比较。

方法

通过识别和分析2000年1月至2019年6月期间发表的关于数据预处理在心脏病分类中的应用的实证研究，进行了系统的文献综述。因此，根据上述标准共选择并分析了49项研究。

结果

综述结果表明，数据约简是心脏病学中最常用的预处理任务，其次是数据清理。一般来说，预处理要么维持要么提高了心脏病分类器的性能。一些组合，如（人工神经网络 + 主成分分析）、（人工神经网络 + 卡方检验）和（支持向量机 + 主成分分析）在准确率方面很有前景。然而，由于缺乏可解释性，这些模型在实际诊断决策支持系统中的部署存在若干风险和局限性。

相似文献

Data preprocessing for heart disease classification: A systematic literature review.用于心脏病分类的数据预处理：一项系统的文献综述。

Comput Methods Programs Biomed. 2020 Oct;195:105635. doi: 10.1016/j.cmpb.2020.105635. Epub 2020 Jul 3.

A systematic map of medical data preprocessing in knowledge discovery.医学数据知识发现中预处理的系统图谱。

Comput Methods Programs Biomed. 2018 Aug;162:69-85. doi: 10.1016/j.cmpb.2018.05.007. Epub 2018 May 5.

Knowledge discovery in cardiology: A systematic literature review.心脏病学中的知识发现：一项系统的文献综述。

Int J Med Inform. 2017 Jan;97:12-32. doi: 10.1016/j.ijmedinf.2016.09.005. Epub 2016 Sep 14.

A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery.心脏病知识发现中的数据准备系统综述研究

J Med Syst. 2018 Dec 13;43(1):17. doi: 10.1007/s10916-018-1134-z.

Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services.预处理乳腺癌数据以提高数据质量、诊断程序和医疗服务水平。

Cancer Inform. 2020 May 27;19:1176935120917955. doi: 10.1177/1176935120917955. eCollection 2020.

Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care.人工智能和机器学习准备的数据预处理技术：癌症护理中可穿戴传感器数据的范围综述。

JMIR Mhealth Uhealth. 2024 Sep 27;12:e59587. doi: 10.2196/59587.

Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning-Based Text-Mining Approach.意大利基于无编码急诊入院记录的儿科伤害监测：基于机器学习的文本挖掘方法。

JMIR Public Health Surveill. 2023 Jul 12;9:e44467. doi: 10.2196/44467.

DCT-Based Preprocessing Approach for ICA in Hyperspectral Data Analysis.高光谱数据分析中基于离散余弦变换的独立成分分析预处理方法

Sensors (Basel). 2018 Apr 8;18(4):1138. doi: 10.3390/s18041138.

Systematic Comparison of the Influence of Different Data Preprocessing Methods on the Performance of Gait Classifications Using Machine Learning.不同数据预处理方法对基于机器学习的步态分类性能影响的系统比较

Front Bioeng Biotechnol. 2020 Apr 15;8:260. doi: 10.3389/fbioe.2020.00260. eCollection 2020.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

Supervised machine learning algorithms for the classification of obesity levels using anthropometric indices derived from bioelectrical impedance analysis.使用源自生物电阻抗分析的人体测量指标对肥胖水平进行分类的监督式机器学习算法。

Sci Rep. 2025 Aug 21;15(1):30681. doi: 10.1038/s41598-025-15264-6.

Recent advances in machine learning for precision diagnosis and treatment of esophageal disorders.机器学习在食管疾病精准诊断与治疗方面的最新进展。

World J Gastroenterol. 2025 Jun 21;31(23):105076. doi: 10.3748/wjg.v31.i23.105076.

A comprehensive review of machine learning for heart disease prediction: challenges, trends, ethical considerations, and future directions.心脏病预测的机器学习综合综述：挑战、趋势、伦理考量及未来方向。

Front Artif Intell. 2025 May 13;8:1583459. doi: 10.3389/frai.2025.1583459. eCollection 2025.

Development and validation of a deep learning-enhanced prediction model for the likelihood of pulmonary embolism.用于预测肺栓塞可能性的深度学习增强预测模型的开发与验证

Front Med (Lausanne). 2025 Feb 6;12:1506363. doi: 10.3389/fmed.2025.1506363. eCollection 2025.

Machine Learning Algorithm-Based Prediction of Diabetes Among Female Population Using PIMA Dataset.基于机器学习算法利用皮马印第安人糖尿病数据集对女性人群糖尿病的预测

Healthcare (Basel). 2024 Dec 29;13(1):37. doi: 10.3390/healthcare13010037.

Data-driven classification and explainable-AI in the field of lung imaging.肺部成像领域中数据驱动的分类与可解释人工智能

Front Big Data. 2024 Sep 19;7:1393758. doi: 10.3389/fdata.2024.1393758. eCollection 2024.

Machine-Learning-Based Prediction of 1-Year Arrhythmia Recurrence after Ventricular Tachycardia Ablation in Patients with Structural Heart Disease.基于机器学习对结构性心脏病患者室性心动过速消融术后1年心律失常复发的预测

Bioengineering (Basel). 2023 Dec 1;10(12):1386. doi: 10.3390/bioengineering10121386.

Epidemiological Determinants of Patient Non-Conveyance to the Hospital in an Emergency Medical Service Environment.在紧急医疗服务环境中，影响患者未能被送往医院的流行病学决定因素。

Int J Environ Res Public Health. 2023 Jul 20;20(14):6404. doi: 10.3390/ijerph20146404.

RETRACTED ARTICLE: Securing health care data through blockchain enabled collaborative machine learning.撤回文章：通过区块链支持的协作式机器学习保障医疗保健数据安全

Soft comput. 2023;27(14):9941-9954. doi: 10.1007/s00500-023-08330-6. Epub 2023 May 23.

Data-Centric AI for Healthcare Fraud Detection.用于医疗欺诈检测的以数据为中心的人工智能。

SN Comput Sci. 2023;4(4):389. doi: 10.1007/s42979-023-01809-x. Epub 2023 May 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于心脏病分类的数据预处理：一项系统的文献综述。

Data preprocessing for heart disease classification: A systematic literature review.

作者信息

机构信息

出版信息

CONTEXT

OBJECTIVE

METHOD

RESULTS

背景

目的

方法

结果

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献