Suppr超能文献

一种使用事务标识(TID)中间项集的新型关联规则挖掘方法。

A novel association rule mining approach using TID intermediate itemset.

作者信息

Aqra Iyad, Herawan Tutut, Abdul Ghani Norjihan, Akhunzada Adnan, Ali Akhtar, Bin Razali Ramdan, Ilahi Manzoor, Raymond Choo Kim-Kwang

机构信息

Department of Information Systems, Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia

Department of Computer Science, COMSATS Institute of Information Technology (CIIT), Islamabad, Pakistan.

出版信息

PLoS One. 2018 Jan 19;13(1):e0179703. doi: 10.1371/journal.pone.0179703. eCollection 2018.

Abstract

Designing an efficient association rule mining (ARM) algorithm for multilevel knowledge-based transactional databases that is appropriate for real-world deployments is of paramount concern. However, dynamic decision making that needs to modify the threshold either to minimize or maximize the output knowledge certainly necessitates the extant state-of-the-art algorithms to rescan the entire database. Subsequently, the process incurs heavy computation cost and is not feasible for real-time applications. The paper addresses efficiently the problem of threshold dynamic updation for a given purpose. The paper contributes by presenting a novel ARM approach that creates an intermediate itemset and applies a threshold to extract categorical frequent itemsets with diverse threshold values. Thus, improving the overall efficiency as we no longer needs to scan the whole database. After the entire itemset is built, we are able to obtain real support without the need of rebuilding the itemset (e.g. Itemset list is intersected to obtain the actual support). Moreover, the algorithm supports to extract many frequent itemsets according to a pre-determined minimum support with an independent purpose. Additionally, the experimental results of our proposed approach demonstrate the capability to be deployed in any mining system in a fully parallel mode; consequently, increasing the efficiency of the real-time association rules discovery process. The proposed approach outperforms the extant state-of-the-art and shows promising results that reduce computation cost, increase accuracy, and produce all possible itemsets.

摘要

为适用于实际部署的基于多级知识的事务数据库设计一种高效的关联规则挖掘(ARM)算法至关重要。然而,动态决策需要修改阈值以最小化或最大化输出知识,这必然要求现有的最先进算法重新扫描整个数据库。随后,该过程会产生高昂的计算成本,对于实时应用来说并不可行。本文有效地解决了给定目的下阈值动态更新的问题。本文提出了一种新颖的ARM方法,该方法创建中间项集并应用阈值来提取具有不同阈值的分类频繁项集,从而做出了贡献。这样一来,由于不再需要扫描整个数据库,整体效率得到了提高。在构建完整个项集后,我们无需重建项集就能获得实际支持度(例如,通过交集项集列表来获得实际支持度)。此外,该算法支持根据预先确定的最小支持度独立地提取多个频繁项集。此外,我们提出的方法的实验结果表明它能够以完全并行模式部署在任何挖掘系统中;因此,提高了实时关联规则发现过程的效率。所提出的方法优于现有的最先进方法,并显示出降低计算成本、提高准确性和生成所有可能项集的良好结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/896f/5774682/5757dd6ef58d/pone.0179703.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验