Chen Yen-Liang, Huang Tony Cheng-Kui
Department of Information Management, National Central University, Chung-Li, Taiwan.
IEEE Trans Syst Man Cybern B Cybern. 2005 Oct;35(5):959-72. doi: 10.1109/tsmcb.2005.847741.
Given a sequence database and minimum support threshold, the task of sequential pattern mining is to discover the complete set of sequential patterns in databases. From the discovered sequential patterns, we can know what items are frequently brought together and in what order they appear. However, they cannot tell us the time gaps between successive items in patterns. Accordingly, Chen et al. have proposed a generalization of sequential patterns, called time-interval sequential patterns, which reveals not only the order of items, but also the time intervals between successive items. An example of time-interval sequential pattern has a form like (A, I2, B, I1, C), meaning that we buy A first, then after an interval of I2 we buy B, and finally after an interval of I1 we buy C, where I2 and I1 are predetermined time ranges. Although this new type of pattern can alleviate the above concern, it causes the sharp boundary problem. That is, when a time interval is near the boundary of two predetermined time ranges, we either ignore or overemphasize it. Therefore, this paper uses the concept of fuzzy sets to extend the original research so that fuzzy time-interval sequential patterns are discovered from databases. Two efficient algorithms, the fuzzy time interval (FTI)-Apriori algorithm and the FTI-PrefixSpan algorithm, are developed for mining fuzzy time-interval sequential patterns. In our simulation results, we find that the second algorithm outperforms the first one, not only in computing time but also in scalability with respect to various parameters.
给定一个序列数据库和最小支持度阈值,序列模式挖掘的任务是发现数据库中完整的序列模式集。从发现的序列模式中,我们可以知道哪些项经常一起出现以及它们出现的顺序。然而,它们无法告诉我们模式中连续项之间的时间间隔。因此,Chen等人提出了一种序列模式的泛化,称为时间间隔序列模式,它不仅揭示了项的顺序,还揭示了连续项之间的时间间隔。时间间隔序列模式的一个例子具有(A, I2, B, I1, C)的形式,意思是我们先购买A,然后在间隔I2后购买B,最后在间隔I1后购买C,其中I2和I1是预先确定的时间范围。尽管这种新型模式可以缓解上述问题,但它会导致尖锐边界问题。也就是说,当一个时间间隔接近两个预先确定的时间范围的边界时,我们要么忽略它,要么过度强调它。因此,本文使用模糊集的概念来扩展原始研究,以便从数据库中发现模糊时间间隔序列模式。开发了两种高效算法,即模糊时间间隔(FTI)-Apriori算法和FTI-PrefixSpan算法,用于挖掘模糊时间间隔序列模式。在我们的模拟结果中,我们发现第二种算法不仅在计算时间上,而且在关于各种参数的可扩展性方面都优于第一种算法。