Ferraz Inhaúma Neves, Garcia Ana Cristina Bicharra
ADDLabs - Active Documentation and Design, Instituto de Computação - Universidade Federal Fluminensem, Av. Gen. Milton Tavares de Souza, s/n - 24210-340 Niterói, RJ Brazil.
Springerplus. 2013 Sep 11;2:452. doi: 10.1186/2193-1801-2-452. eCollection 2013.
Data mining has emerged to address the problem of transforming data into useful knowledge. Although most data mining techniques, such as the use of association rules, may substantially reduce the search effort over large data sets, often, the consequential outcomes surpass the amount of information humanly manageable. On the other hand, important association rules may be overlooked owing to the setting of the support threshold, which is a very subjective metric, but rooted in most data mining techniques. This paper presents a study on the effects, in terms of precision and recall, of using a data preparation technique, called SemPrune, which is built on domain ontology. SemPrune is intended for pre- and post-processing phases of data mining. Identifying generalization/specialization relations, as well as composition/decomposition relations, is the key to successfully applying SemPrune.
数据挖掘应运而生,旨在解决将数据转化为有用知识这一问题。尽管大多数数据挖掘技术,如关联规则的使用,可能会大幅减少在大型数据集上的搜索工作量,但通常情况下,其产生的结果超出了人类可管理的信息量。另一方面,由于支持阈值的设置(这是一个非常主观的指标,但在大多数数据挖掘技术中都存在),重要的关联规则可能会被忽视。本文呈现了一项关于使用一种名为SemPrune的数据准备技术(基于领域本体构建)在精确率和召回率方面所产生效果的研究。SemPrune旨在用于数据挖掘的预处理和后处理阶段。识别泛化/特化关系以及组合/分解关系是成功应用SemPrune的关键。