Suppr超能文献

一种应用于死亡率人口数据库的数据挖掘中的数据准备方法。

A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases.

作者信息

Pérez Joaquín, Iturbide Emmanuel, Olivares Víctor, Hidalgo Miguel, Martínez Alicia, Almanza Nelva

机构信息

Tecnológico Nacional de México / CENIDET, Interior Internado Palmira s/n, Palmira, 62490, Cuernavaca, Morelos, Mexico.

Universidad Politécnica de Madrid, ETSII, Boadilla del Monte, Madrid, Spain.

出版信息

J Med Syst. 2015 Nov;39(11):152. doi: 10.1007/s10916-015-0312-5. Epub 2015 Sep 18.

Abstract

It is known that the data preparation phase is the most time consuming in the data mining process, using up to 50% or up to 70% of the total project time. Currently, data mining methodologies are of general purpose and one of their limitations is that they do not provide a guide about what particular task to develop in a specific domain. This paper shows a new data preparation methodology oriented to the epidemiological domain in which we have identified two sets of tasks: General Data Preparation and Specific Data Preparation. For both sets, the Cross-Industry Standard Process for Data Mining (CRISP-DM) is adopted as a guideline. The main contribution of our methodology is fourteen specialized tasks concerning such domain. To validate the proposed methodology, we developed a data mining system and the entire process was applied to real mortality databases. The results were encouraging because it was observed that the use of the methodology reduced some of the time consuming tasks and the data mining system showed findings of unknown and potentially useful patterns for the public health services in Mexico.

摘要

众所周知,数据准备阶段是数据挖掘过程中最耗时的,占用项目总时间的50%甚至70%。目前,数据挖掘方法是通用的,其局限性之一在于它们没有针对特定领域应开展的具体任务提供指导。本文展示了一种面向流行病学领域的新数据准备方法,我们在其中确定了两组任务:通用数据准备和特定数据准备。对于这两组任务,均采用跨行业数据挖掘标准流程(CRISP-DM)作为指导方针。我们方法的主要贡献是针对该领域的十四项专门任务。为了验证所提出的方法,我们开发了一个数据挖掘系统,并将整个过程应用于实际死亡率数据库。结果令人鼓舞,因为观察到该方法的使用减少了一些耗时任务,并且数据挖掘系统显示出了对墨西哥公共卫生服务而言未知且可能有用的模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6e96/4575356/9cc5c95b2254/10916_2015_312_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验