快速序列数据实用挖掘。

Fast Utility Mining on Sequence Data.

出版信息

IEEE Trans Cybern. 2021 Feb;51(2):487-500. doi: 10.1109/TCYB.2020.2970176. Epub 2021 Jan 15.

DOI:10.1109/TCYB.2020.2970176

Abstract

High-utility sequential pattern (HUSP) mining is an emerging topic in the field of knowledge discovery in databases. It consists of discovering subsequences that have a high utility (importance) in sequences, which can be referred to as HUSPs. HUSPs can be applied to many real-life applications, such as market basket analysis, e-commerce recommendations, click-stream analysis, and route planning. Several algorithms have been proposed to efficiently mine utility-based useful sequential patterns. However, due to the combinatorial explosion of the search space for low utility threshold and large-scale data, the performances of these algorithms are unsatisfactory in terms of runtime and memory usage. Hence, this article proposes an efficient algorithm for the task of HUSP mining, called HUSP mining with UL-list (HUSP-ULL). It utilizes a lexicographic q -sequence (LQS)-tree and a utility-linked (UL)-list structure to quickly discover HUSPs. Furthermore, two pruning strategies are introduced in HUSP-ULL to obtain tight upper bounds on the utility of the candidate sequences and reduce the search space by pruning unpromising candidates early. Substantial experiments on both real-life and synthetic datasets showed that HUSP-ULL can effectively and efficiently discover the complete set of HUSPs and that it outperforms the state-of-the-art algorithms.

摘要

高效用序贯模式（HUSP）挖掘是数据库知识发现领域的一个新兴课题。它包括发现序列中具有高效用（重要性）的子序列，这些子序列可以被称为 HUSPs。HUSPs 可以应用于许多实际应用，如市场篮子分析、电子商务推荐、点击流分析和路线规划。已经提出了几种算法来有效地挖掘基于效用的有用序贯模式。然而，由于低效用阈值和大规模数据的搜索空间的组合爆炸，这些算法在运行时和内存使用方面的性能并不令人满意。因此，本文提出了一种用于 HUSP 挖掘任务的高效算法，称为带有 UL-list 的 HUSP 挖掘（HUSP-ULL）。它利用词典序 q-序列（LQS）-树和效用链接（UL）-列表结构来快速发现 HUSPs。此外，HUSP-ULL 中引入了两种剪枝策略，以对候选序列的效用获得严格的上界，并通过尽早剪枝无希望的候选者来减少搜索空间。在真实数据集和合成数据集上的大量实验表明，HUSP-ULL 可以有效地发现完整的 HUSPs 集，并且性能优于最新算法。

相似文献

Fast Utility Mining on Sequence Data.

IEEE Trans Cybern. 2021 Feb;51(2):487-500. doi: 10.1109/TCYB.2020.2970176. Epub 2021 Jan 15.

Mining actionable combined high utility incremental and associated sequential patterns.

PLoS One. 2023 Mar 29;18(3):e0283365. doi: 10.1371/journal.pone.0283365. eCollection 2023.

Mining of high utility-probability sequential patterns from uncertain databases.

PLoS One. 2017 Jul 25;12(7):e0180931. doi: 10.1371/journal.pone.0180931. eCollection 2017.

HUOPM: High-Utility Occupancy Pattern Mining.

IEEE Trans Cybern. 2020 Mar;50(3):1195-1208. doi: 10.1109/TCYB.2019.2896267. Epub 2019 Feb 20.

WildSpan: mining structured motifs from protein sequences.

Algorithms Mol Biol. 2011 Mar 31;6(1):6. doi: 10.1186/1748-7188-6-6.

Mining Contiguous Sequential Generators in Biological Sequences.

IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):855-867. doi: 10.1109/TCBB.2015.2495132. Epub 2015 Oct 26.

Scalable and Efficient Approach for High Temporal Fuzzy Utility Pattern Mining.

IEEE Trans Cybern. 2023 Dec;53(12):7672-7685. doi: 10.1109/TCYB.2022.3198661. Epub 2023 Nov 29.

An incremental high-utility mining algorithm with transaction insertion.

ScientificWorldJournal. 2015;2015:161564. doi: 10.1155/2015/161564. Epub 2015 Feb 25.

IPHM: Incremental periodic high-utility mining algorithm in dynamic and evolving data environments.

Heliyon. 2024 Sep 12;10(18):e37761. doi: 10.1016/j.heliyon.2024.e37761. eCollection 2024 Sep 30.

An Efficient Incremental Mining Algorithm for Discovering Sequential Pattern in Wireless Sensor Network Environments.

Sensors (Basel). 2018 Dec 21;19(1):29. doi: 10.3390/s19010029.

引用本文的文献

Improved adaptive-phase fuzzy high utility pattern mining algorithm based on tree-list structure for intelligent decision systems.

Sci Rep. 2024 Jan 10;14(1):945. doi: 10.1038/s41598-023-50375-y.

Mining actionable combined high utility incremental and associated sequential patterns.

PLoS One. 2023 Mar 29;18(3):e0283365. doi: 10.1371/journal.pone.0283365. eCollection 2023.

Supervised sequential pattern mining of event sequences in sport to identify important patterns of play: An application to rugby union.

PLoS One. 2021 Sep 23;16(9):e0256329. doi: 10.1371/journal.pone.0256329. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

快速序列数据实用挖掘。

Fast Utility Mining on Sequence Data.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献