IEEE Trans Neural Netw Learn Syst. 2019 Sep;30(9):2764-2778. doi: 10.1109/TNNLS.2018.2886199. Epub 2019 Jan 10.
As an important tool for behavior informatics, negative sequential patterns (NSPs) (such as missing a medical treatment) are sometimes much more informative than positive sequential patterns (PSPs) (e.g., attending a medical treatment) in many applications. However, NSP mining is at an early stage and faces many challenging problems, including 1) how to mine an expected number of NSPs; 2) how to select useful NSPs; and 3) how to reduce high time consumption. To solve the first problem, we propose an algorithm Topk-NSP to mine the k most frequent negative patterns. In Topk-NSP, we first mine the top- k PSPs using the existing methods, and then we use an idea which is similar to top- k PSPs mining to mine the top- k NSPs from these PSPs. To solve the remaining two problems, we propose three optimization strategies for Topk-NSP. The first optimization strategy is that, in order to consider the influence of PSPs when selecting useful top- k NSPs, we introduce two weights, w and w , to express the user preference degree for NSPs and PSPs, respectively, and select useful NSPs by a weighted support wsup. The second optimization strategy is to merge wsup and an interestingness metric to select more useful NSPs. The third optimization strategy is to introduce a pruning strategy to reduce the high computational costs of Topk-NSP. Finally, we propose an optimization algorithm Topk-NSP. To the best of our knowledge, Topk-NSP is the first algorithm that can mine the top- k useful NSPs. The experimental results on four synthetic and two real-life data sets show that the Topk-NSP is very efficient in mining the top- k NSPs in the sense of computational cost and scalability.
作为行为信息学的重要工具,负序模式(如漏医)在许多应用中有时比正序模式(如就医)更具信息量。然而,负序模式挖掘仍处于初级阶段,面临许多具有挑战性的问题,包括 1)如何挖掘预期数量的负序模式;2)如何选择有用的负序模式;3)如何降低高时间消耗。为了解决第一个问题,我们提出了一种算法 Topk-NSP 来挖掘 k 个最频繁的负模式。在 Topk-NSP 中,我们首先使用现有的方法挖掘 top-k PSPs,然后使用类似于 top-k PSPs 挖掘的思想从这些 PSPs 中挖掘 top-k NSPs。为了解决其余两个问题,我们提出了三种针对 Topk-NSP 的优化策略。第一种优化策略是,为了在选择有用的 top-k NSPs 时考虑 PSPs 的影响,我们引入了两个权重 w 和 w,分别表示用户对 NSPs 和 PSPs 的偏好程度,并通过加权支持 wsup 选择有用的 NSPs。第二种优化策略是将 wsup 和一个有趣性度量合并起来,以选择更有用的 NSPs。第三种优化策略是引入一种剪枝策略,以降低 Topk-NSP 的高计算成本。最后,我们提出了一种优化算法 Topk-NSP。据我们所知,Topk-NSP 是第一个能够挖掘出 top-k 有用的 NSPs 的算法。在四个合成数据集和两个真实数据集上的实验结果表明,Topk-NSP 在挖掘 top-k NSPs 的计算成本和可扩展性方面非常高效。