通过学习挖掘 top-k 有用的负序贯模式。

Mining Top- k Useful Negative Sequential Patterns via Learning.

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 Sep;30(9):2764-2778. doi: 10.1109/TNNLS.2018.2886199. Epub 2019 Jan 10.

DOI:10.1109/TNNLS.2018.2886199

Abstract

As an important tool for behavior informatics, negative sequential patterns (NSPs) (such as missing a medical treatment) are sometimes much more informative than positive sequential patterns (PSPs) (e.g., attending a medical treatment) in many applications. However, NSP mining is at an early stage and faces many challenging problems, including 1) how to mine an expected number of NSPs; 2) how to select useful NSPs; and 3) how to reduce high time consumption. To solve the first problem, we propose an algorithm Topk-NSP to mine the k most frequent negative patterns. In Topk-NSP, we first mine the top- k PSPs using the existing methods, and then we use an idea which is similar to top- k PSPs mining to mine the top- k NSPs from these PSPs. To solve the remaining two problems, we propose three optimization strategies for Topk-NSP. The first optimization strategy is that, in order to consider the influence of PSPs when selecting useful top- k NSPs, we introduce two weights, w and w , to express the user preference degree for NSPs and PSPs, respectively, and select useful NSPs by a weighted support wsup. The second optimization strategy is to merge wsup and an interestingness metric to select more useful NSPs. The third optimization strategy is to introduce a pruning strategy to reduce the high computational costs of Topk-NSP. Finally, we propose an optimization algorithm Topk-NSP. To the best of our knowledge, Topk-NSP is the first algorithm that can mine the top- k useful NSPs. The experimental results on four synthetic and two real-life data sets show that the Topk-NSP is very efficient in mining the top- k NSPs in the sense of computational cost and scalability.

摘要

作为行为信息学的重要工具，负序模式（如漏医）在许多应用中有时比正序模式（如就医）更具信息量。然而，负序模式挖掘仍处于初级阶段，面临许多具有挑战性的问题，包括 1）如何挖掘预期数量的负序模式；2）如何选择有用的负序模式；3）如何降低高时间消耗。为了解决第一个问题，我们提出了一种算法 Topk-NSP 来挖掘 k 个最频繁的负模式。在 Topk-NSP 中，我们首先使用现有的方法挖掘 top-k PSPs，然后使用类似于 top-k PSPs 挖掘的思想从这些 PSPs 中挖掘 top-k NSPs。为了解决其余两个问题，我们提出了三种针对 Topk-NSP 的优化策略。第一种优化策略是，为了在选择有用的 top-k NSPs 时考虑 PSPs 的影响，我们引入了两个权重 w 和 w，分别表示用户对 NSPs 和 PSPs 的偏好程度，并通过加权支持 wsup 选择有用的 NSPs。第二种优化策略是将 wsup 和一个有趣性度量合并起来，以选择更有用的 NSPs。第三种优化策略是引入一种剪枝策略，以降低 Topk-NSP 的高计算成本。最后，我们提出了一种优化算法 Topk-NSP。据我们所知，Topk-NSP 是第一个能够挖掘出 top-k 有用的 NSPs 的算法。在四个合成数据集和两个真实数据集上的实验结果表明，Topk-NSP 在挖掘 top-k NSPs 的计算成本和可扩展性方面非常高效。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过学习挖掘 top-k 有用的负序贯模式。

Mining Top- k Useful Negative Sequential Patterns via Learning.

出版信息

相似文献

引用本文的文献

通过学习挖掘 top-k 有用的负序贯模式。

Mining Top- k Useful Negative Sequential Patterns via Learning.

出版信息

相似文献

引用本文的文献