Dong Wenqin, Lee Eric W, Hertzberg Vicki Stover, Simpson Roy L, Ho Joyce C
Carnegie Mellon University.
Emory University.
Adv Databases Inf Syst. 2021 Aug;1450:50-60. doi: 10.1007/978-3-030-85082-1_5. Epub 2021 Jul 17.
Sequential pattern mining can be used to extract meaningful sequences from electronic health records. However, conventional sequential pattern mining algorithms that discover all frequent sequential patterns can incur a high computational and be susceptible to noise in the observations. Approximate sequential pattern mining techniques have been introduced to address these shortcomings yet, existing approximate methods fail to reflect the true frequent sequential patterns or only target single-item event sequences. Multi-item event sequences are prominent in healthcare as a patient can have multiple interventions for a single visit. To alleviate these issues, we propose GASP, a graph-based approximate sequential pattern mining, that discovers frequent patterns for multi-item event sequences. Our approach compresses the sequential information into a concise graph structure which has computational benefits. The empirical results on two healthcare datasets suggest that GASP outperforms existing approximate models by improving recoverability and extracts better predictive patterns.
序列模式挖掘可用于从电子健康记录中提取有意义的序列。然而,发现所有频繁序列模式的传统序列模式挖掘算法可能会产生高计算量,并且容易受到观测数据中噪声的影响。为了解决这些缺点,人们引入了近似序列模式挖掘技术,但现有的近似方法无法反映真正的频繁序列模式,或者仅针对单项目事件序列。在医疗保健领域,多项目事件序列很突出,因为患者在一次就诊中可能会有多种干预措施。为了缓解这些问题,我们提出了GASP,一种基于图的近似序列模式挖掘方法,它可以发现多项目事件序列的频繁模式。我们的方法将序列信息压缩成一个简洁的图结构,这具有计算优势。在两个医疗数据集上的实证结果表明,GASP通过提高可恢复性优于现有的近似模型,并能提取出更好的预测模式。