Chen Shuhong, Yang Sen, Zhou Moliang, Burd Randall S, Marsic Ivan
Rutgers University, NJ, USA.
Children's National Medical Center, Washington, D.C., USA.
IEEE Int Conf Data Min Workshops. 2017 Nov;2017:438-445. doi: 10.1109/ICDMW.2017.63. Epub 2017 Dec 18.
Adapted from biological sequence alignment, trace alignment is a process mining technique used to visualize and analyze workflow data. Any analysis done with this method, however, is affected by the alignment quality. The best existing trace alignment techniques use progressive guide-trees to heuristically approximate the optimal alignment in O(NL) time. These algorithms are heavily dependent on the selected guide-tree metric, often return sum-of-pairs-score-reducing errors that interfere with interpretation, and are computationally intensive for large datasets. To alleviate these issues, we propose process-oriented iterative multiple alignment (PIMA), which contains specialized optimizations to better handle workflow data. We demonstrate that PIMA is a flexible framework capable of achieving better sum-of-pairs score than existing trace alignment algorithms in only O(NL) time. We applied PIMA to analyzing medical workflow data, showing how iterative alignment can better represent the data and facilitate the extraction of insights from data visualization.
轨迹对齐是一种从生物序列比对改编而来的过程挖掘技术,用于可视化和分析工作流数据。然而,使用此方法进行的任何分析都会受到对齐质量的影响。现有的最佳轨迹对齐技术使用渐进引导树,以启发式方式在O(NL)时间内近似最优对齐。这些算法严重依赖于所选的引导树度量,经常返回会干扰解释的成对得分降低错误,并且对于大型数据集计算量很大。为了缓解这些问题,我们提出了面向过程的迭代多重对齐(PIMA),它包含专门的优化以更好地处理工作流数据。我们证明PIMA是一个灵活的框架,能够在仅O(NL)时间内比现有轨迹对齐算法获得更好的成对得分总和。我们将PIMA应用于分析医疗工作流数据,展示了迭代对齐如何能更好地表示数据并促进从数据可视化中提取见解。