医学数据知识发现中预处理的系统图谱。

A systematic map of medical data preprocessing in knowledge discovery.

机构信息

Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.

Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Spain.

出版信息

Comput Methods Programs Biomed. 2018 Aug;162:69-85. doi: 10.1016/j.cmpb.2018.05.007. Epub 2018 May 5.

DOI:10.1016/j.cmpb.2018.05.007

PMID:29903496

Abstract

BACKGROUND AND OBJECTIVE

Datamining (DM) has, over the last decade, received increased attention in the medical domain and has been widely used to analyze medical datasets in order to extract useful knowledge and previously unknown patterns. However, historical medical data can often comprise inconsistent, noisy, imbalanced, missing and high dimensional data. These challenges lead to a serious bias in predictive modeling and reduce the performance of DM techniques. Data preprocessing is, therefore, an essential step in knowledge discovery as regards improving the quality of data and making it appropriate and suitable for DM techniques. The objective of this paper is to review the use of preprocessing techniques in clinical datasets.

METHODS

We performed a systematic map of studies regarding the application of data preprocessing to healthcare and published between January 2000 and December 2017. A search string was determined on the basis of the mapping questions and the PICO categories. The search string was then applied in digital databases covering the fields of computer science and medical informatics in order to identify relevant studies. The studies were initially selected by reading their titles, abstracts and keywords. Those that were selected at that stage were then reviewed using a set of inclusion and exclusion criteria in order to eliminate any that were not relevant. This process resulted in 126 primary studies.

RESULTS

Selected studies were analyzed and classified according to their publication years and channels, research type, empirical type and contribution type. The findings of this mapping study revealed that researchers have paid a considerable amount of attention to preprocessing in medical DM in last decade. A significant number of the selected studies used data reduction and cleaning preprocessing tasks. Moreover, the disciplines in which preprocessing have received most attention are: cardiology, endocrinology and oncology.

CONCLUSIONS

Researchers should develop and implement standards for an effective integration of multiple medical data types. Moreover, we identified the need to perform literature reviews.

摘要

背景与目的

在过去十年中，数据挖掘（DM）在医学领域受到了越来越多的关注，并被广泛用于分析医疗数据集，以提取有用的知识和以前未知的模式。然而，历史医疗数据通常可能包含不一致、嘈杂、不平衡、缺失和高维数据。这些挑战导致预测模型严重偏向，降低了 DM 技术的性能。因此，数据预处理是知识发现的一个重要步骤，可以提高数据的质量，并使其适合 DM 技术。本文的目的是回顾预处理技术在临床数据集中的应用。

方法

我们对 2000 年 1 月至 2017 年 12 月期间发表的关于将数据预处理应用于医疗保健的研究进行了系统的图谱绘制。基于映射问题和 PICO 类别确定了搜索字符串。然后，该搜索字符串被应用于涵盖计算机科学和医学信息学领域的数字数据库，以识别相关研究。通过阅读标题、摘要和关键字初步选择研究。然后，使用一套包括和排除标准对这些研究进行审查，以排除不相关的研究。这一过程产生了 126 项主要研究。

结果

所选研究根据其出版年份和渠道、研究类型、实证类型和贡献类型进行了分析和分类。这项映射研究的结果表明，研究人员在过去十年中对医学 DM 中的预处理给予了相当大的关注。相当多的选定研究使用了数据减少和清理预处理任务。此外，预处理受到关注最多的学科是：心脏病学、内分泌学和肿瘤学。

结论

研究人员应制定和实施有效整合多种医疗数据类型的标准。此外，我们还发现需要进行文献综述。

相似文献

A systematic map of medical data preprocessing in knowledge discovery.医学数据知识发现中预处理的系统图谱。

Comput Methods Programs Biomed. 2018 Aug;162:69-85. doi: 10.1016/j.cmpb.2018.05.007. Epub 2018 May 5.

A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery.心脏病知识发现中的数据准备系统综述研究

J Med Syst. 2018 Dec 13;43(1):17. doi: 10.1007/s10916-018-1134-z.

Data preprocessing for heart disease classification: A systematic literature review.用于心脏病分类的数据预处理：一项系统的文献综述。

Comput Methods Programs Biomed. 2020 Oct;195:105635. doi: 10.1016/j.cmpb.2020.105635. Epub 2020 Jul 3.

Personal health data: A systematic mapping study.个人健康数据：系统映射研究。

Int J Med Inform. 2018 Oct;118:86-98. doi: 10.1016/j.ijmedinf.2018.08.006. Epub 2018 Aug 4.

Knowledge discovery in cardiology: A systematic literature review.心脏病学中的知识发现：一项系统的文献综述。

Int J Med Inform. 2017 Jan;97:12-32. doi: 10.1016/j.ijmedinf.2016.09.005. Epub 2016 Sep 14.

Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values.用于对存在缺失值的医疗保健数据进行分类的多级加权支持向量机

PLoS One. 2016 May 19;11(5):e0155119. doi: 10.1371/journal.pone.0155119. eCollection 2016.

How has the impact of 'care pathway technologies' on service integration in stroke care been measured and what is the strength of the evidence to support their effectiveness in this respect?“护理路径技术”对卒中护理服务整合的影响是如何衡量的，以及有哪些证据支持其在这方面的有效性？

Int J Evid Based Healthc. 2008 Mar;6(1):78-110. doi: 10.1111/j.1744-1609.2007.00098.x.

An efficient data preprocessing approach for large scale medical data mining.一种用于大规模医学数据挖掘的高效数据预处理方法。

Technol Health Care. 2015;23(2):153-60. doi: 10.3233/THC-140887.

Systematic mapping study of data mining-based empirical studies in cardiology.基于数据挖掘的心脏病学实证研究的系统制图研究。

Health Informatics J. 2019 Sep;25(3):741-770. doi: 10.1177/1460458217717636. Epub 2017 Aug 1.

On designing a biosignal-based fetal state assessment system: A systematic mapping study.基于生物信号的胎儿状态评估系统设计：系统映射研究。

Comput Methods Programs Biomed. 2022 Apr;216:106671. doi: 10.1016/j.cmpb.2022.106671. Epub 2022 Feb 1.

引用本文的文献

Enhancing Automatic PT Tagging for MEDLINE Citations Using Transformer-Based Models.使用基于Transformer的模型增强MEDLINE引文的自动PT标注

ArXiv. 2025 Jun 3:arXiv:2506.03321v1.

Algorithms and tools for data-driven omics integration to achieve multilayer biological insights: a narrative review.用于数据驱动的组学整合以实现多层生物学见解的算法和工具：一篇综述

J Transl Med. 2025 Apr 10;23(1):425. doi: 10.1186/s12967-025-06446-x.

Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection.基于自然语言处理的 COVID-19 疑似患者识别。

Cad Saude Publica. 2023 Dec 4;39(11):e00243722. doi: 10.1590/0102-311XPT243722. eCollection 2023.

CADUCEO: A Platform to Support Federated Healthcare Facilities through Artificial Intelligence.CADUCEO：一个通过人工智能支持联合医疗保健机构的平台。

Healthcare (Basel). 2023 Aug 4;11(15):2199. doi: 10.3390/healthcare11152199.

Improvement method for cervical cancer detection: A comparative analysis.提高宫颈癌检出率的方法：对比分析。

Oncol Res. 2022 Oct 10;29(5):365-376. doi: 10.32604/or.2022.025897. eCollection 2021.

Introduction of Health Information Technology Professionals for Data Mining in Hospitals.医院数据挖掘中的健康信息技术专业人员介绍。

Iran J Public Health. 2021 Nov;50(11):2355-2357. doi: 10.18502/ijph.v50i11.7598.

Reviewing Machine Learning and Image Processing Based Decision-Making Systems for Breast Cancer Imaging.基于机器学习和图像处理的乳腺癌成像决策支持系统综述。

J Med Syst. 2021 Jan 4;45(1):8. doi: 10.1007/s10916-020-01689-1.

A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery.心脏病知识发现中的数据准备系统综述研究

J Med Syst. 2018 Dec 13;43(1):17. doi: 10.1007/s10916-018-1134-z.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

医学数据知识发现中预处理的系统图谱。

A systematic map of medical data preprocessing in knowledge discovery.

机构信息

出版信息

BACKGROUND AND OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景与目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献