事件流处理中孤立事件的前缀插补

Prefix Imputation of Orphan Events in Event Stream Processing.

作者信息

Zaman Rashid, Hassani Marwan, Van Dongen Boudewijn F

机构信息

Process Analytics Group, Faculty of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, Netherlands.

出版信息

Front Big Data. 2021 Oct 6;4:705243. doi: 10.3389/fdata.2021.705243. eCollection 2021.

DOI:10.3389/fdata.2021.705243

PMID:34693281

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8528154/

Abstract

In the context of process mining, event logs consist of process instances called cases. Conformance checking is a process mining task that inspects whether a log file is conformant with an existing process model. This inspection is additionally quantifying the conformance in an explainable manner. Online conformance checking processes streaming event logs by having precise insights into the running cases and timely mitigating non-conformance, if any. State-of-the-art online conformance checking approaches bound the memory by either delimiting storage of the events per case or limiting the number of cases to a specific window width. The former technique still requires unbounded memory as the number of cases to store is unlimited, while the latter technique forgets running, not yet concluded, cases to conform to the limited window width. Consequently, the processing system may later encounter events that represent some intermediate activity as per the process model and for which the relevant case has been forgotten, to be referred to as orphan events. The naïve approach to cope with an orphan event is to either neglect its relevant case for conformance checking or treat it as an altogether new case. However, this might result in misleading process insights, for instance, overestimated non-conformance. In order to bound memory yet effectively incorporate the orphan events into processing, we propose an imputation of missing-prefix approach for such orphan events. Our approach utilizes the existing process model for imputing the missing prefix. Furthermore, we leverage the case storage management to increase the accuracy of the prefix prediction. We propose a systematic forgetting mechanism that distinguishes and forgets the cases that can be reliably regenerated as prefix upon receipt of their future orphan event. We evaluate the efficacy of our proposed approach through multiple experiments with synthetic and three real event logs while simulating a streaming setting. Our approach achieves considerably higher realistic conformance statistics than the state of the art while requiring the same storage.

摘要

在流程挖掘的背景下，事件日志由称为案例的流程实例组成。一致性检查是一项流程挖掘任务，用于检查日志文件是否与现有流程模型一致。这种检查还以可解释的方式量化一致性。在线一致性检查通过对正在运行的案例有精确的洞察并及时缓解不一致情况（如果有）来处理流式事件日志。最先进的在线一致性检查方法通过限制每个案例的事件存储或限制案例数量到特定窗口宽度来限制内存。前一种技术仍然需要无界内存，因为要存储的案例数量是无限的，而后一种技术会忘记正在运行但尚未结束的案例以符合有限的窗口宽度。因此，处理系统稍后可能会遇到根据流程模型表示某些中间活动的事件，而相关案例已被遗忘，这些事件被称为孤立事件。处理孤立事件的简单方法是要么在一致性检查中忽略其相关案例，要么将其视为全新的案例。然而，这可能会导致误导性的流程洞察，例如，高估不一致性。为了限制内存并有效地将孤立事件纳入处理，我们针对此类孤立事件提出了一种缺失前缀插补方法。我们的方法利用现有流程模型来插补缺失的前缀。此外，我们利用案例存储管理来提高前缀预测的准确性。我们提出了一种系统的遗忘机制，该机制在接收到未来的孤立事件时区分并忘记可以可靠地作为前缀重新生成的案例。我们通过对合成日志和三个真实事件日志进行多次实验并模拟流式设置来评估我们提出的方法的有效性。我们的方法在需要相同存储的情况下，比现有技术实现了更高的实际一致性统计。