用于增量频繁项集挖掘的并行和分布式方法。

Parallel and distributed methods for incremental frequent itemset mining.

作者信息

Otey Matthew Eric, Parthasarathy Srinivasan, Wang Chao, Veloso Adriano, Meira Wagner

机构信息

Computer and Information Science Department, The Ohio State University, Columbus, OH 43210, USA.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2004 Dec;34(6):2439-50. doi: 10.1109/tsmcb.2004.836887.

DOI:10.1109/tsmcb.2004.836887

PMID:15619944

Abstract

Traditional methods for data mining typically make the assumption that the data is centralized, memory-resident, and static. This assumption is no longer tenable. Such methods waste computational and input/output (I/O) resources when data is dynamic, and they impose excessive communication overhead when data is distributed. Efficient implementation of incremental data mining methods is, thus, becoming crucial for ensuring system scalability and facilitating knowledge discovery when data is dynamic and distributed. In this paper, we address this issue in the context of the important task of frequent itemset mining. We first present an efficient algorithm which dynamically maintains the required information even in the presence of data updates without examining the entire dataset. We then show how to parallelize this incremental algorithm. We also propose a distributed asynchronous algorithm, which imposes minimal communication overhead for mining distributed dynamic datasets. Our distributed approach is capable of generating local models (in which each site has a summary of its own database) as well as the global model of frequent itemsets (in which all sites have a summary of the entire database). This ability permits our approach not only to generate frequent itemsets, but also to generate high-contrast frequent itemsets, which allows one to examine how the data is skewed over different sites.

摘要

传统的数据挖掘方法通常假定数据是集中式的、驻留在内存中的且是静态的。这种假设已不再成立。当数据是动态的时，此类方法会浪费计算和输入/输出（I/O）资源，而当数据是分布式时，它们会带来过多的通信开销。因此，高效实现增量数据挖掘方法对于确保系统可扩展性以及在数据动态且分布式的情况下促进知识发现变得至关重要。在本文中，我们在频繁项集挖掘这一重要任务的背景下解决此问题。我们首先提出一种高效算法，即使在存在数据更新的情况下，该算法也能动态维护所需信息，而无需检查整个数据集。然后我们展示如何将此增量算法并行化。我们还提出了一种分布式异步算法，该算法在挖掘分布式动态数据集时带来的通信开销最小。我们的分布式方法能够生成局部模型（其中每个站点都有其自身数据库的摘要）以及频繁项集的全局模型（其中所有站点都有整个数据库的摘要）。这种能力使我们的方法不仅能够生成频繁项集，还能生成高对比度频繁项集，从而使人们能够研究数据在不同站点上的倾斜情况。

相似文献

Parallel and distributed methods for incremental frequent itemset mining.用于增量频繁项集挖掘的并行和分布式方法。

IEEE Trans Syst Man Cybern B Cybern. 2004 Dec;34(6):2439-50. doi: 10.1109/tsmcb.2004.836887.

Association rule mining in peer-to-peer systems.对等网络系统中的关联规则挖掘。

IEEE Trans Syst Man Cybern B Cybern. 2004 Dec;34(6):2426-38. doi: 10.1109/tsmcb.2004.836888.

Distributed data mining on grids: services, tools, and applications.网格上的分布式数据挖掘：服务、工具与应用。

IEEE Trans Syst Man Cybern B Cybern. 2004 Dec;34(6):2451-65. doi: 10.1109/tsmcb.2004.836890.

An efficient algorithm for mining closed itemsets.一种挖掘封闭项集的高效算法。

J Zhejiang Univ Sci. 2004 Jan;5(1):8-15. doi: 10.1007/BF02839306.

A hybrid model for improving response time in distributed data mining.一种用于改善分布式数据挖掘中响应时间的混合模型。

IEEE Trans Syst Man Cybern B Cybern. 2004 Dec;34(6):2466-79. doi: 10.1109/tsmcb.2004.836885.

Mining multilevel and location-aware service patterns in mobile web environments.挖掘移动网络环境中的多层次和位置感知服务模式。

IEEE Trans Syst Man Cybern B Cybern. 2004 Dec;34(6):2480-5. doi: 10.1109/tsmcb.2004.836886.

Rule mining and classification in a situation assessment application: a belief-theoretic approach for handling data imperfections.态势评估应用中的规则挖掘与分类：一种处理数据不完美性的信念理论方法

IEEE Trans Syst Man Cybern B Cybern. 2007 Dec;37(6):1446-59. doi: 10.1109/tsmcb.2007.903536.

The Mining Algorithm of Maximum Frequent Itemsets Based on Frequent Pattern Tree.基于频繁模式树的最大频繁项集挖掘算法。

Comput Intell Neurosci. 2022 May 18;2022:7022168. doi: 10.1155/2022/7022168. eCollection 2022.

Using Greedy algorithm: DBSCAN revisited II.使用贪心算法：重新审视DBSCAN II。

J Zhejiang Univ Sci. 2004 Nov;5(11):1405-12. doi: 10.1631/jzus.2004.1405.

Scalable model-based clustering for large databases based on data summarization.基于数据汇总的大型数据库可扩展模型聚类

IEEE Trans Pattern Anal Mach Intell. 2005 Nov;27(11):1710-9. doi: 10.1109/TPAMI.2005.226.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于增量频繁项集挖掘的并行和分布式方法。

Parallel and distributed methods for incremental frequent itemset mining.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献