Suppr超能文献

从不确定数据库中挖掘高效用概率序列模式。

Mining of high utility-probability sequential patterns from uncertain databases.

作者信息

Zhang Binbin, Lin Jerry Chun-Wei, Fournier-Viger Philippe, Li Ting

机构信息

Department of Biochemistry and Molecular Biology, Health Science Center of Shenzhen University, Shenzhen, China.

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China.

出版信息

PLoS One. 2017 Jul 25;12(7):e0180931. doi: 10.1371/journal.pone.0180931. eCollection 2017.

Abstract

High-utility sequential pattern mining (HUSPM) has become an important issue in the field of data mining. Several HUSPM algorithms have been designed to mine high-utility sequential patterns (HUPSPs). They have been applied in several real-life situations such as for consumer behavior analysis and event detection in sensor networks. Nonetheless, most studies on HUSPM have focused on mining HUPSPs in precise data. But in real-life, uncertainty is an important factor as data is collected using various types of sensors that are more or less accurate. Hence, data collected in a real-life database can be annotated with existing probabilities. This paper presents a novel pattern mining framework called high utility-probability sequential pattern mining (HUPSPM) for mining high utility-probability sequential patterns (HUPSPs) in uncertain sequence databases. A baseline algorithm with three optional pruning strategies is presented to mine HUPSPs. Moroever, to speed up the mining process, a projection mechanism is designed to create a database projection for each processed sequence, which is smaller than the original database. Thus, the number of unpromising candidates can be greatly reduced, as well as the execution time for mining HUPSPs. Substantial experiments both on real-life and synthetic datasets show that the designed algorithm performs well in terms of runtime, number of candidates, memory usage, and scalability for different minimum utility and minimum probability thresholds.

摘要

高实用性序列模式挖掘(HUSPM)已成为数据挖掘领域的一个重要问题。已经设计了几种HUSPM算法来挖掘高实用性序列模式(HUPSP)。它们已应用于多种实际情况,如消费者行为分析和传感器网络中的事件检测。尽管如此,大多数关于HUSPM的研究都集中在精确数据中挖掘HUPSP。但在现实生活中,不确定性是一个重要因素,因为数据是使用各种精度或多或少的传感器收集的。因此,在现实生活数据库中收集的数据可以用现有的概率进行标注。本文提出了一种新颖的模式挖掘框架,称为高实用性-概率序列模式挖掘(HUPSPM),用于在不确定序列数据库中挖掘高实用性-概率序列模式(HUPSP)。提出了一种带有三种可选剪枝策略的基线算法来挖掘HUPSP。此外,为了加快挖掘过程,设计了一种投影机制,为每个处理后的序列创建一个数据库投影,该投影比原始数据库小。因此,可以大大减少无希望的候选者数量,以及挖掘HUPSP的执行时间。在真实数据集和合成数据集上进行的大量实验表明,所设计的算法在运行时、候选者数量、内存使用以及针对不同最小实用性和最小概率阈值时的可扩展性方面表现良好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a952/5526537/86db2e6435fb/pone.0180931.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验