Marin-Castro Heidy M, Morales-Sandoval Miguel, González-Compean José Luis, Hernandez Julio
Universidad de las Américas, Cholula, Puebla, Mexico.
Computer Science, Instituto Nacional de Astrofísica, Óptica y Electrónica, Tonantzintla, Puebla, Mexico.
PeerJ Comput Sci. 2024 Dec 18;10:e2601. doi: 10.7717/peerj-cs.2601. eCollection 2024.
It is crucial for organizations to ensure that their business processes are executed accurately and comply with internal policies and requirements. Process mining is a discipline of data science that exploits business process execution data to analyze and improve business processes. It provides a data-driven approach to understanding how processes actually work in practice. Conformance checking is one of the three most relevant process mining tasks. It consists of determining the degree of correspondence or deviation between the expected (or modeled) behavior of a process the real one observed and revealed from the historical events recorded in an event log during the execution of each instance of the process. Under a big data scenario, traditional conformance checking methods struggle to analyzing the instances or traces in large event logs, increasing the associated computational cost. In this article, we study and address the conformance-checking task supported by a traces selection approach that uses representative sample data of the event log and thus reduces the processing time and computational cost without losing confidence in the obtained conformance value. As main contributions, we present a novel conformance checking method that (i) takes into account the data dispersion that exists in the event log data using a statistic measure, (ii) determines the size of the representative sample of the event log for the conformance checking task, and (iii) establishes selection criteria of traces based on the dispersion level. The method was validated and evaluated using fitness, precision, generalization, and processing time metrics by experiments on three actual event logs in the health domain and two synthetic event logs. The experimental evaluation and results revealed the effectiveness of our method in coping with the problem of conformance between a process model and its corresponding large event log.
对于组织而言,确保其业务流程准确执行并符合内部政策和要求至关重要。流程挖掘是数据科学的一个学科,它利用业务流程执行数据来分析和改进业务流程。它提供了一种数据驱动的方法来理解流程在实际中的运作方式。一致性检查是三个最相关的流程挖掘任务之一。它包括确定流程的预期(或建模)行为与在流程的每个实例执行期间从事件日志中记录的历史事件观察和揭示的实际行为之间的对应程度或偏差程度。在大数据场景下,传统的一致性检查方法难以分析大型事件日志中的实例或轨迹,从而增加了相关的计算成本。在本文中,我们研究并解决了由一种轨迹选择方法支持的一致性检查任务,该方法使用事件日志的代表性样本数据,从而减少处理时间和计算成本,同时又不会对获得的一致性值失去信心。作为主要贡献,我们提出了一种新颖的一致性检查方法,该方法(i)使用统计度量考虑事件日志数据中存在的数据离散度,(ii)确定用于一致性检查任务的事件日志代表性样本的大小,以及(iii)基于离散度水平建立轨迹选择标准。通过对健康领域的三个实际事件日志和两个合成事件日志进行实验,使用适应性、精度、泛化性和处理时间指标对该方法进行了验证和评估。实验评估和结果揭示了我们的方法在应对流程模型与其相应的大型事件日志之间的一致性问题方面的有效性。