一种应用于死亡率人口数据库的数据挖掘中的数据准备方法。

A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases.

作者信息

Pérez Joaquín, Iturbide Emmanuel, Olivares Víctor, Hidalgo Miguel, Martínez Alicia, Almanza Nelva

机构信息

Tecnológico Nacional de México / CENIDET, Interior Internado Palmira s/n, Palmira, 62490, Cuernavaca, Morelos, Mexico.

Universidad Politécnica de Madrid, ETSII, Boadilla del Monte, Madrid, Spain.

出版信息

J Med Syst. 2015 Nov;39(11):152. doi: 10.1007/s10916-015-0312-5. Epub 2015 Sep 18.

DOI:10.1007/s10916-015-0312-5

PMID:26385549

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4575356/

Abstract

It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50% or up to 70% of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particular task to develop in a specific domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging because it was observed that the use of the methodology reduced some of the time consuming tasks and the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico.

摘要

众所周知，数据准备阶段是数据挖掘过程中最耗时的，占用项目总时间的50%甚至70%。目前，数据挖掘方法是通用的，其局限性之一在于它们没有针对特定领域应开展的具体任务提供指导。本文展示了一种面向流行病学领域的新数据准备方法，我们在其中确定了两组任务：通用数据准备和特定数据准备。对于这两组任务，均采用跨行业数据挖掘标准流程（CRISP-DM）作为指导方针。我们方法的主要贡献是针对该领域的十四项专门任务。为了验证所提出的方法，我们开发了一个数据挖掘系统，并将整个过程应用于实际死亡率数据库。结果令人鼓舞，因为观察到该方法的使用减少了一些耗时任务，并且数据挖掘系统显示出了对墨西哥公共卫生服务而言未知且可能有用的模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e96/4575356/9cc5c95b2254/10916_2015_312_Fig1_HTML.jpg

相似文献

A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases.

J Med Syst. 2015 Nov;39(11):152. doi: 10.1007/s10916-015-0312-5. Epub 2015 Sep 18.

Discovering metric temporal constraint networks on temporal databases.

Artif Intell Med. 2013 Jul;58(3):139-54. doi: 10.1016/j.artmed.2013.03.006. Epub 2013 May 6.

On mining clinical pathway patterns from medical behaviors.

Artif Intell Med. 2012 Sep;56(1):35-50. doi: 10.1016/j.artmed.2012.06.002. Epub 2012 Jul 17.

A data mining system for providing analytical information on brain tumors to public health decision makers.

Comput Methods Programs Biomed. 2013 Mar;109(3):269-82. doi: 10.1016/j.cmpb.2012.10.010. Epub 2012 Oct 31.

Clinical data mining: a review.

Yearb Med Inform. 2009:121-33.

Temporal pattern mining for multivariate clinical decision support.

Stud Health Technol Inform. 2013;192:1228.

Mining of high utility-probability sequential patterns from uncertain databases.

PLoS One. 2017 Jul 25;12(7):e0180931. doi: 10.1371/journal.pone.0180931. eCollection 2017.

Data-Driven Rule Mining and Representation of Temporal Patterns in Physiological Sensor Data.

IEEE J Biomed Health Inform. 2015 Sep;19(5):1557-66. doi: 10.1109/JBHI.2015.2438645.

Relation mining experiments in the pharmacogenomics domain.

J Biomed Inform. 2012 Oct;45(5):851-61. doi: 10.1016/j.jbi.2012.04.014. Epub 2012 May 10.

Visual pattern mining in histology image collections using bag of features.

Artif Intell Med. 2011 Jun;52(2):91-106. doi: 10.1016/j.artmed.2011.04.010. Epub 2011 Jun 12.

引用本文的文献

Predicting breast cancer 5-year survival using machine learning: A systematic review.

PLoS One. 2021 Apr 16;16(4):e0250370. doi: 10.1371/journal.pone.0250370. eCollection 2021.

Applying Data Science methods and tools to unveil healthcare use of lung cancer patients in a teaching hospital in Spain.

Clin Transl Oncol. 2019 Nov;21(11):1472-1481. doi: 10.1007/s12094-019-02074-2. Epub 2019 Mar 12.

本文引用的文献

A new data preparation method based on clustering algorithms for diagnosis systems of heart and diabetes diseases.

J Med Syst. 2014 May;38(5):48. doi: 10.1007/s10916-014-0048-7. Epub 2014 Apr 16.

Application of data mining on the development of a disease distribution map of screened community residents of Taipei county in Taiwan.

J Med Syst. 2012 Jun;36(3):2021-7. doi: 10.1007/s10916-011-9664-7. Epub 2011 Feb 25.

A preprocessing method for improving data mining techniques. Application to a large medical diabetes database.

Stud Health Technol Inform. 2003;95:269-74.

[High frequency of precancerous lesions of gastric cancer associated with Helicobacter pylori and response to treatment, in Chiapas, Mexico].

Gac Med Mex. 2002 Sep-Oct;138(5):405-10.

Heterogeneous database integration in biomedicine.

J Biomed Inform. 2001 Aug;34(4):285-98. doi: 10.1006/jbin.2001.1024.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种应用于死亡率人口数据库的数据挖掘中的数据准备方法。

A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献