Sommers Dominique, Sidorova Natalia, van Dongen Boudewijn
Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, the Netherlands.
Process Sci. 2025;2(1):1. doi: 10.1007/s44311-025-00006-8. Epub 2025 Mar 20.
The assessment of process mining techniques using real-life data is often compromised by the lack of ground truth knowledge, the presence of non-essential outliers in system behavior and recording errors in event logs. Using synthetically generated data could leverage ground truth for better evaluation. Existing log generation tools inject noise directly into the logs, which does not capture many typical behavioral deviations. Furthermore, the link between the model and the log, which is needed for later assessment, becomes lost. We propose a ground-truth approach for generating process data from existing or synthetic initial process models, whether automatically generated or hand-made. This approach incorporates patterns of behavioral deviations and recording errors to produce a synthetic yet realistic deviating model and imperfect event log. These, together with the initial model, are required to assess process mining techniques based on ground truth knowledge. We demonstrate this approach to create datasets of synthetic process data for three processes, one of which we used in a conformance checking use case, focusing on the assessment of (relaxed) systemic alignments to expose and explain deviations in modeled and recorded behavior. Our results show that this approach, unlike traditional methods, provides detailed insights into the strengths and weaknesses of process mining techniques, both quantitatively and qualitatively.
使用实际数据对过程挖掘技术进行评估,常常因缺乏基本事实知识、系统行为中存在非必要的异常值以及事件日志中的记录错误而受到影响。使用合成生成的数据可以利用基本事实进行更好的评估。现有的日志生成工具直接将噪声注入日志中,这无法捕捉到许多典型的行为偏差。此外,后续评估所需的模型与日志之间的联系也会丢失。我们提出一种基于基本事实的方法,用于从现有的或合成的初始过程模型(无论是自动生成还是手工制作)生成过程数据。这种方法纳入了行为偏差和记录错误的模式,以生成一个合成但逼真的偏差模型和不完美的事件日志。这些与初始模型一起,用于基于基本事实知识评估过程挖掘技术。我们展示了这种方法,为三个过程创建了合成过程数据集,其中一个用于一致性检查用例,重点评估(宽松的)系统对齐,以揭示和解释建模行为与记录行为中的偏差。我们的结果表明,与传统方法不同,这种方法在定量和定性方面都能深入了解过程挖掘技术的优缺点。