不确定数据流中频繁模式的超结构挖掘

Hyper-structure mining of frequent patterns in uncertain data streams.

作者信息

Hewanadungodage Chandima, Xia Yuni, Lee Jaehwan John, Tu Yi-Cheng

机构信息

Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, 723 West Michigan Street, Indianapolis, IN 46202-5132, USA.

Department of Electrical and Computer Engineering, Indiana University-Purdue University Indianapolis, 723 West Michigan Street, Indianapolis, IN 46202-5132, USA.

出版信息

Knowl Inf Syst. 2013 Oct 1;37(1):219-244. doi: 10.1007/s10115-012-0581-y.

DOI:10.1007/s10115-012-0581-y

PMID:24729652

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3983695/

Abstract

Data uncertainty is inherent in many real-world applications such as sensor monitoring systems, location-based services, and medical diagnostic systems. Moreover, many real-world applications are now capable of producing continuous, unbounded data streams. During the recent years, new methods have been developed to find frequent patterns in uncertain databases; nevertheless, very limited work has been done in discovering frequent patterns in uncertain data streams. The current solutions for frequent pattern mining in uncertain streams take a FP-tree-based approach; however, recent studies have shown that FP-tree-based algorithms do not perform well in the presence of data uncertainty. In this paper, we propose two hyper-structure-based false-positive-oriented algorithms to efficiently mine frequent itemsets from streams of uncertain data. The first algorithm, UHS-Stream, is designed to find all frequent itemsets up to the current moment. The second algorithm, TFUHS-Stream, is designed to find frequent itemsets in an uncertain data stream in a time-fading manner. Experimental results show that the proposed hyper-structure-based algorithms outperform the existing tree-based algorithms in terms of accuracy, runtime, and memory usage.

摘要

数据不确定性在许多现实世界的应用中都存在，如传感器监测系统、基于位置的服务和医疗诊断系统。此外，许多现实世界的应用现在能够产生连续的、无界的数据流。近年来，已经开发出了新的方法来在不确定数据库中发现频繁模式；然而，在不确定数据流中发现频繁模式的工作却非常有限。当前用于不确定流中频繁模式挖掘的解决方案采用基于FP树的方法；然而，最近的研究表明，基于FP树的算法在存在数据不确定性的情况下表现不佳。在本文中，我们提出了两种基于超结构的面向误报的算法，以有效地从不确定数据流中挖掘频繁项集。第一种算法UHS-Stream旨在找到截至当前时刻的所有频繁项集。第二种算法TFUHS-Stream旨在以时间衰减的方式在不确定数据流中找到频繁项集。实验结果表明，所提出的基于超结构的算法在准确性、运行时间和内存使用方面优于现有的基于树的算法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

不确定数据流中频繁模式的超结构挖掘

Hyper-structure mining of frequent patterns in uncertain data streams.

作者信息

机构信息

出版信息

相似文献

不确定数据流中频繁模式的超结构挖掘

Hyper-structure mining of frequent patterns in uncertain data streams.

作者信息

机构信息

出版信息

相似文献