Gao Xinming, Gong Yongshun, Xu TianTian, Lu Jinhu, Zhao Yuhai, Dong Xiangjun
IEEE Trans Neural Netw Learn Syst. 2023 Feb;34(2):571-585. doi: 10.1109/TNNLS.2020.3041732. Epub 2023 Feb 3.
Nonoccurring behavior (NOB) studies have attracted the growing attention of scholars as a crucial part of behavioral science. As an effective method to discover both NOB and occurring behaviors (OB), negative sequential pattern (NSP) mining is successfully used in analyzing medical treatment and abnormal behavior patterns. At this time, NSP mining is still an active and challenging research domain. Most of the algorithms are inefficient in practice. Briefly, the key weaknesses of NSP mining are: 1) an inefficient positive sequential pattern (PSP) mining process, 2) a strict constraint of negative containment, and 3) the lack of an effective Negative Sequential Candidate (NSC) generation method. To address these weaknesses, we propose a highly efficient algorithm with improved techniques, named sc-NSP, to mine NSP efficiently. We first propose an improved PrefixSpan algorithm in the PSP mining process, which connects to a bitmap storage structure instead of the original structure. Second, sc-NSP loosens the frequency constraint and exploits the NSC generation method of positive and negative sequential patterns mining (PNSP) (a classic NSP mining method). Furthermore, a novel pruning strategy is designed to reduce the computational complexity of sc-NSP. Finally, sc-NSP obtains the support of NSC by using the most efficient bitwise-based calculation operation. Theoretical analyses show that sc-NSP performs particularly well on data sets with a large number of elements and items in sequence. Comparison and extensive experiments along with case studies on health data show that sc-NSP is 10 times more efficient than other state-of-the-art methods, and the number of NSPs obtained is 5 times greater than other methods.
非发生行为(NOB)研究作为行为科学的关键部分,已引起学者们越来越多的关注。作为发现非发生行为和发生行为(OB)的有效方法,负序列模式(NSP)挖掘已成功应用于分析医疗和异常行为模式。此时,NSP挖掘仍是一个活跃且具有挑战性的研究领域。大多数算法在实践中效率低下。简而言之,NSP挖掘的关键弱点在于:1)正序列模式(PSP)挖掘过程效率低下,2)对负包含的严格约束,以及3)缺乏有效的负序列候选(NSC)生成方法。为解决这些弱点,我们提出一种具有改进技术的高效算法,称为sc-NSP,以高效挖掘NSP。我们首先在PSP挖掘过程中提出一种改进的PrefixSpan算法,它连接到位图存储结构而非原始结构。其次,sc-NSP放宽频率约束并采用正负序列模式挖掘(PNSP)(一种经典的NSP挖掘方法)的NSC生成方法。此外,设计了一种新颖的剪枝策略以降低sc-NSP的计算复杂度。最后,sc-NSP通过使用最有效的基于位运算的计算操作来获得NSC的支持。理论分析表明,sc-NSP在具有大量元素和序列项的数据集上表现特别出色。通过对健康数据的比较、广泛实验以及案例研究表明,sc-NSP的效率比其他现有方法高10倍,并且获得的NSP数量比其他方法多5倍。