Bogetti Anthony T, Leung Jeremy M G, Chong Lillian T
Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260.
bioRxiv. 2023 Oct 19:2023.08.17.553774. doi: 10.1101/2023.08.17.553774.
The pathways by which a molecular process transitions to a target state are highly sought-after as direct views of a transition mechanism. While great strides have been made in the physics-based simulation of such pathways, the analysis of these pathways can be a major challenge due to their diversity and variable lengths. Here we present the LPATH Python tool, which implements a semi-automated method for linguistics-assisted clustering of pathways into distinct classes (or routes). This method involves three steps: 1) discretizing the configurational space into key states, 2) extracting a text-string sequence of key visited states for each pathway, and 3) pairwise matching of pathways based on a text-string similarity score. To circumvent the prohibitive memory requirements of the first step, we have implemented a general two-stage method for clustering conformational states that exploits machine learning. LPATH is primarily designed for use with the WESTPA software for weighted ensemble simulations; however, the tool can also be applied to conventional simulations. As demonstrated for the C7 to C7 conformational transition of alanine dipeptide, LPATH provides physically reasonable classes of pathways and corresponding probabilities.
作为转变机制的直接视图,分子过程转变为目标状态的途径备受关注。虽然在基于物理的此类途径模拟方面已经取得了很大进展,但由于这些途径的多样性和可变长度,对其进行分析可能是一项重大挑战。在这里,我们展示了LPATH Python工具,它实现了一种半自动化方法,用于将途径进行语言学辅助聚类,分为不同的类别(或路线)。该方法包括三个步骤:1)将构型空间离散为关键状态;2)为每个途径提取关键访问状态的文本字符串序列;3)基于文本字符串相似性得分对途径进行成对匹配。为了规避第一步对内存的过高要求,我们实现了一种利用机器学习的通用两阶段方法来聚类构象状态。LPATH主要设计用于与WESTPA软件配合进行加权系综模拟;然而,该工具也可应用于传统模拟。正如丙氨酸二肽从C7到C7构象转变所证明的那样,LPATH提供了物理上合理的途径类别和相应概率。