面向医疗过程挖掘的过程导向迭代多序列比对

Process-oriented Iterative Multiple Alignment for Medical Process Mining.

作者信息

Chen Shuhong, Yang Sen, Zhou Moliang, Burd Randall S, Marsic Ivan

机构信息

Rutgers University, NJ, USA.

Children's National Medical Center, Washington, D.C., USA.

出版信息

IEEE Int Conf Data Min Workshops. 2017 Nov;2017:438-445. doi: 10.1109/ICDMW.2017.63. Epub 2017 Dec 18.

DOI:10.1109/ICDMW.2017.63

PMID:30364463

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6196034/

Abstract

Adapted from biological sequence alignment, trace alignment is a process mining technique used to visualize and analyze workflow data. Any analysis done with this method, however, is affected by the alignment quality. The best existing trace alignment techniques use progressive guide-trees to heuristically approximate the optimal alignment in O(NL) time. These algorithms are heavily dependent on the selected guide-tree metric, often return sum-of-pairs-score-reducing errors that interfere with interpretation, and are computationally intensive for large datasets. To alleviate these issues, we propose process-oriented iterative multiple alignment (PIMA), which contains specialized optimizations to better handle workflow data. We demonstrate that PIMA is a flexible framework capable of achieving better sum-of-pairs score than existing trace alignment algorithms in only O(NL) time. We applied PIMA to analyzing medical workflow data, showing how iterative alignment can better represent the data and facilitate the extraction of insights from data visualization.

摘要

轨迹对齐是一种从生物序列比对改编而来的过程挖掘技术，用于可视化和分析工作流数据。然而，使用此方法进行的任何分析都会受到对齐质量的影响。现有的最佳轨迹对齐技术使用渐进引导树，以启发式方式在O(NL)时间内近似最优对齐。这些算法严重依赖于所选的引导树度量，经常返回会干扰解释的成对得分降低错误，并且对于大型数据集计算量很大。为了缓解这些问题，我们提出了面向过程的迭代多重对齐（PIMA），它包含专门的优化以更好地处理工作流数据。我们证明PIMA是一个灵活的框架，能够在仅O(NL)时间内比现有轨迹对齐算法获得更好的成对得分总和。我们将PIMA应用于分析医疗工作流数据，展示了迭代对齐如何能更好地表示数据并促进从数据可视化中提取见解。

相似文献

Process-oriented Iterative Multiple Alignment for Medical Process Mining.面向医疗过程挖掘的过程导向迭代多序列比对

IEEE Int Conf Data Min Workshops. 2017 Nov;2017:438-445. doi: 10.1109/ICDMW.2017.63. Epub 2017 Dec 18.

Medical Workflow Modeling Using Alignment-Guided State-Splitting HMM.使用对齐引导状态分裂隐马尔可夫模型的医疗工作流建模

Proc (IEEE Int Conf Healthc Inform). 2017 Aug;2017:144-153. doi: 10.1109/ICHI.2017.66. Epub 2017 Sep 14.

Application of the MAFFT sequence alignment program to large data-reexamination of the usefulness of chained guide trees.将MAFFT序列比对程序应用于对链式引导树实用性的大数据重新检验。

Bioinformatics. 2016 Nov 1;32(21):3246-3251. doi: 10.1093/bioinformatics/btw412. Epub 2016 Jul 4.

ProgSIO-MSA: Progressive-based single iterative optimization framework for multiple sequence alignment using an effective scoring system.ProgSIO-MSA：基于渐进式的单次迭代优化框架，使用有效的评分系统进行多序列比对。

J Bioinform Comput Biol. 2020 Apr;18(2):2050005. doi: 10.1142/S0219720020500055. Epub 2020 May 6.

Comprehensive study on iterative algorithms of multiple sequence alignment.多重序列比对迭代算法的综合研究

Comput Appl Biosci. 1995 Feb;11(1):13-8. doi: 10.1093/bioinformatics/11.1.13.

Evaluation of Trace Alignment Quality and its Application in Medical Process Mining.微量比对质量评估及其在医疗过程挖掘中的应用

Proc (IEEE Int Conf Healthc Inform). 2017 Aug;2017:258-267. doi: 10.1109/ICHI.2017.57. Epub 2017 Sep 14.

Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments.通过参照结构比对进行迭代优化，多重蛋白质序列比对的准确性得到显著提高。

J Mol Biol. 1996 Dec 13;264(4):823-38. doi: 10.1006/jmbi.1996.0679.

Optimal alignment between groups of sequences and its application to multiple sequence alignment.序列组之间的最优比对及其在多序列比对中的应用。

Comput Appl Biosci. 1993 Jun;9(3):361-70. doi: 10.1093/bioinformatics/9.3.361.

Ancestral sequence alignment under optimal conditions.在最佳条件下进行祖先序列比对。

BMC Bioinformatics. 2005 Nov 17;6:273. doi: 10.1186/1471-2105-6-273.

Approximate multiple protein structure alignment using the sum-of-pairs distance.使用对和距离进行近似多蛋白质结构比对。

J Comput Biol. 2004;11(5):986-1000. doi: 10.1089/cmb.2004.11.986.

引用本文的文献

ProcessGAN: Generating Privacy-Preserving Time-Aware Process Data with Conditional Generative Adversarial Nets.ProcessGAN：使用条件生成对抗网络生成隐私保护的时间感知过程数据。

ACM Trans Knowl Discov Data. 2024 Nov;18(9). doi: 10.1145/3687464. Epub 2024 Nov 12.

Discovering interpretable medical process models: A case study in trauma resuscitation.发现可解释的医疗过程模型：创伤复苏中的案例研究。

J Biomed Inform. 2023 Apr;140:104344. doi: 10.1016/j.jbi.2023.104344. Epub 2023 Mar 20.

Discovering Interpretable Medical Workflow Models.

Proc (IEEE Int Conf Healthc Inform). 2018 Jun;2018:437-439. doi: 10.1109/ICHI.2018.00089. Epub 2018 Jul 26.

本文引用的文献

Medical Workflow Modeling Using Alignment-Guided State-Splitting HMM.使用对齐引导状态分裂隐马尔可夫模型的医疗工作流建模

Proc (IEEE Int Conf Healthc Inform). 2017 Aug;2017:144-153. doi: 10.1109/ICHI.2017.66. Epub 2017 Sep 14.

A Data-driven Process Recommender Framework.一个数据驱动的流程推荐框架。

KDD. 2017 Aug;2017:2111-2120. doi: 10.1145/3097983.3098174.

Process mining in healthcare: A literature review.医疗保健中的流程挖掘：文献综述

J Biomed Inform. 2016 Jun;61:224-36. doi: 10.1016/j.jbi.2016.04.007. Epub 2016 Apr 22.

Workflow mining and outlier detection from clinical activity logs.从临床活动日志中进行工作流挖掘和异常检测。

J Biomed Inform. 2012 Dec;45(6):1185-90. doi: 10.1016/j.jbi.2012.08.003. Epub 2012 Aug 19.

A comprehensive comparison of multiple sequence alignment programs.多个序列比对程序的全面比较。

Nucleic Acids Res. 1999 Jul 1;27(13):2682-90. doi: 10.1093/nar/27.13.2682.

Identification of common molecular subsequences.常见分子子序列的鉴定

J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.

A general method applicable to the search for similarities in the amino acid sequence of two proteins.一种适用于寻找两种蛋白质氨基酸序列相似性的通用方法。

J Mol Biol. 1970 Mar;48(3):443-53. doi: 10.1016/0022-2836(70)90057-4.

A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons.一种蛋白质序列快速多重比对的策略。来自三级结构比较的置信水平。

J Mol Biol. 1987 Nov 20;198(2):327-37. doi: 10.1016/0022-2836(87)90316-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验