用于分子动力学轨迹结构聚类的大小和形状空间高斯混合模型。

Size-and-Shape Space Gaussian Mixture Models for Structural Clustering of Molecular Dynamics Trajectories.

机构信息

Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States.

Department of Chemistry, New York University, New York, New York 10003, United States.

出版信息

J Chem Theory Comput. 2022 May 10;18(5):3218-3230. doi: 10.1021/acs.jctc.1c01290. Epub 2022 Apr 28.

DOI:10.1021/acs.jctc.1c01290

PMID:35483073

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9228201/

Abstract

Determining the optimal number and identity of structural clusters from an ensemble of molecular configurations continues to be a challenge. Recent structural clustering methods have focused on the use of internal coordinates due to the innate rotational and translational invariance of these features. The vast number of possible internal coordinates necessitates a feature space supervision step to make clustering tractable but yields a protocol that can be system type-specific. Particle positions offer an appealing alternative to internal coordinates but suffer from a lack of rotational and translational invariance, as well as a perceived insensitivity to regions of structural dissimilarity. Here, we present a method, denoted shape-GMM, that overcomes the shortcomings of particle positions using a weighted maximum likelihood alignment procedure. This alignment strategy is then built into an expectation maximization Gaussian mixture model (GMM) procedure to capture metastable states in the free-energy landscape. The resulting algorithm distinguishes between a variety of different structures, including those indistinguishable by root-mean-square displacement and pairwise distances, as demonstrated on several model systems. Shape-GMM results on an extensive simulation of the fast-folding HP35 Nle/Nle mutant protein support a four-state folding/unfolding mechanism, which is consistent with previous experimental results and provides kinetic details comparable to previous state-of-the art clustering approaches, as measured by the VAMP-2 score. Currently, training of shape-GMMs is recommended for systems (or subsystems) that can be represented by ≲200 particles and ≲100k configurations to estimate high-dimensional covariance matrices and balance computational expense. Once a shape-GMM is trained, it can be used to predict the cluster identities of millions of configurations.

摘要

从分子构象的集合中确定最佳的结构簇数量和身份仍然是一个挑战。最近的结构聚类方法侧重于使用内部坐标，因为这些特征具有固有的旋转和平移不变性。大量可能的内部坐标需要特征空间监督步骤来使聚类变得可行，但会产生一种特定于系统类型的协议。粒子位置提供了一种替代内部坐标的诱人选择，但由于缺乏旋转和平移不变性，以及对结构差异区域的感知不敏感性，因此受到限制。在这里，我们提出了一种方法，称为 shape-GMM，它使用加权最大似然对齐程序克服了粒子位置的缺点。然后，该对齐策略被构建到期望最大化高斯混合模型（GMM）程序中，以捕获自由能景观中的亚稳态。由此产生的算法可以区分多种不同的结构，包括那些通过均方根位移和成对距离无法区分的结构，这在几个模型系统上得到了证明。在对快速折叠 HP35 Nle/Nle 突变蛋白的广泛模拟中，shape-GMM 的结果支持了四态折叠/展开机制，这与先前的实验结果一致，并提供了与先前的最先进聚类方法相当的动力学细节，如 VAMP-2 评分所衡量的。目前，建议在可以用 ≲200 个粒子和 ≲100k 个构象表示的系统（或子系统）上训练 shape-GMM，以估计高维协方差矩阵并平衡计算费用。一旦训练了 shape-GMM，就可以用于预测数百万个构象的聚类身份。

相似文献

Size-and-Shape Space Gaussian Mixture Models for Structural Clustering of Molecular Dynamics Trajectories.用于分子动力学轨迹结构聚类的大小和形状空间高斯混合模型。

J Chem Theory Comput. 2022 May 10;18(5):3218-3230. doi: 10.1021/acs.jctc.1c01290. Epub 2022 Apr 28.

Variational embedding of protein folding simulations using Gaussian mixture variational autoencoders.使用高斯混合变分自动编码器对蛋白质折叠模拟进行变分嵌入。

J Chem Phys. 2021 Nov 21;155(19):194108. doi: 10.1063/5.0069708.

Multisource single-cell data integration by MAW barycenter for Gaussian mixture models.基于 MAW 质心的高斯混合模型进行多源单细胞数据整合。

Biometrics. 2023 Jun;79(2):866-877. doi: 10.1111/biom.13630. Epub 2022 Mar 15.

Combined Gaussian Mixture Model and Pathfinder Algorithm for Data Clustering.用于数据聚类的高斯混合模型与探路者算法相结合

Entropy (Basel). 2023 Jun 16;25(6):946. doi: 10.3390/e25060946.

Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space.基于无监督化学空间聚类的准确分子轨道机器学习能量。

J Chem Theory Comput. 2022 Aug 9;18(8):4826-4835. doi: 10.1021/acs.jctc.2c00396. Epub 2022 Jul 20.

Regularized Gaussian Mixture Model for High-Dimensional Clustering.用于高维聚类的正则化高斯混合模型

IEEE Trans Cybern. 2019 Oct;49(10):3677-3688. doi: 10.1109/TCYB.2018.2846404. Epub 2018 Jun 27.

Comparing geometric and kinetic cluster algorithms for molecular simulation data.比较分子模拟数据的几何和动力簇算法。

J Chem Phys. 2010 Feb 21;132(7):074110. doi: 10.1063/1.3301140.

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象：化学与物理邂逅生物学（瑞士阿斯科纳，2012年6月10日至14日）

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

A new iterative initialization of EM algorithm for Gaussian mixture models.一种新的高斯混合模型 EM 算法的迭代初始化。

PLoS One. 2023 Apr 13;18(4):e0284114. doi: 10.1371/journal.pone.0284114. eCollection 2023.

A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data.一种用于对来自独立高斯分布和贝塔分布数据的基因进行聚类的联合有限混合模型。

BMC Bioinformatics. 2009 May 29;10:165. doi: 10.1186/1471-2105-10-165.

引用本文的文献

Improved Data-Driven Collective Variables for Biased Sampling through Iteration on Biased Data.通过对有偏数据进行迭代改进用于有偏采样的数据驱动集体变量。

J Phys Chem B. 2025 Jun 26;129(25):6163-6171. doi: 10.1021/acs.jpcb.5c02164. Epub 2025 Jun 12.

A direct computational assessment of vinculin-actin unbinding kinetics reveals catch-bonding behavior.对纽蛋白-肌动蛋白解离动力学的直接计算评估揭示了捕获键合行为。

Proc Natl Acad Sci U S A. 2025 May 27;122(21):e2425982122. doi: 10.1073/pnas.2425982122. Epub 2025 May 21.

Hierarchical Extended Linkage Method (HELM)'s Deep Dive into Hybrid Clustering Strategies.分层扩展链接方法（HELM）对混合聚类策略的深入研究。

bioRxiv. 2025 Mar 10:2025.03.05.641742. doi: 10.1101/2025.03.05.641742.

Exploring transition states of protein conformational changes via out-of-distribution detection in the hyperspherical latent space.通过超球面潜在空间中的分布外检测探索蛋白质构象变化的过渡态。

Nat Commun. 2025 Jan 2;16(1):349. doi: 10.1038/s41467-024-55228-4.

Alzheimer's Disease Immunotherapy and Mimetic Peptide Design for Drug Development: Mutation Screening, Molecular Dynamics, and a Quantum Biochemistry Approach Focusing on Aducanumab::Aβ2-7 Binding Affinity.用于药物开发的阿尔茨海默病免疫疗法与模拟肽设计：聚焦阿杜卡努单抗与Aβ2-7结合亲和力的突变筛选、分子动力学及量子生物化学方法

ACS Chem Neurosci. 2024 Oct 2;15(19):3543-3562. doi: 10.1021/acschemneuro.4c00453. Epub 2024 Sep 20.

Clustering Molecules at a Large Scale: Integrating Spectral Geometry with Deep Learning.大规模分子聚类：将光谱几何与深度学习相结合

Molecules. 2024 Aug 17;29(16):3902. doi: 10.3390/molecules29163902.

Good Rates From Bad Coordinates: The Exponential Average Time-dependent Rate Approach.不良坐标下的良好速率：指数平均时间相关速率方法。

J Chem Theory Comput. 2024 Jul 23;20(14):5901-5912. doi: 10.1021/acs.jctc.4c00425. Epub 2024 Jul 2.

k-Means NANI: An Improved Clustering Algorithm for Molecular Dynamics Simulations.k均值NANI：一种用于分子动力学模拟的改进聚类算法。

J Chem Theory Comput. 2024 Jul 9;20(13):5583-5597. doi: 10.1021/acs.jctc.4c00308. Epub 2024 Jun 21.

Motif-VI loop acts as a nucleotide valve in the West Nile Virus NS3 Helicase.基序-VI 环在西尼罗河病毒 NS3 解旋酶中充当核苷酸阀。

Nucleic Acids Res. 2024 Jul 22;52(13):7447-7464. doi: 10.1093/nar/gkae500.

Quantifying Unbiased Conformational Ensembles from Biased Simulations Using ShapeGMM.使用 ShapeGMM 从有偏模拟中定量无偏构象集合。

J Chem Theory Comput. 2024 May 14;20(9):3492-3502. doi: 10.1021/acs.jctc.4c00223. Epub 2024 Apr 25.

本文引用的文献

CATBOSS: Cluster Analysis of Trajectories Based on Segment Splitting.基于分段拆分的轨迹聚类分析。

J Chem Inf Model. 2021 Oct 25;61(10):5066-5081. doi: 10.1021/acs.jcim.1c00598. Epub 2021 Oct 5.

UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study.UMAP 作为生物大分子分子动力学模拟的降维工具：一项对比研究。

J Phys Chem B. 2021 May 20;125(19):5022-5034. doi: 10.1021/acs.jpcb.1c02081. Epub 2021 May 11.

Unsupervised Learning Methods for Molecular Simulation Data.无监督学习方法在分子模拟数据中的应用。

Chem Rev. 2021 Aug 25;121(16):9722-9758. doi: 10.1021/acs.chemrev.0c01195. Epub 2021 May 4.

Optimal dimensionality reduction of Markov chains using graph transformation.使用图变换对马尔可夫链进行最优降维。

J Chem Phys. 2020 Dec 28;153(24):244108. doi: 10.1063/5.0025174.

Sapphire-Based Clustering.蓝宝石聚类。

J Chem Theory Comput. 2020 Oct 13;16(10):6383-6396. doi: 10.1021/acs.jctc.0c00604. Epub 2020 Sep 24.

CLoNe: automated clustering based on local density neighborhoods for application to biomolecular structural ensembles.CLoNe：基于局部密度邻域的自动聚类方法在生物分子结构集合中的应用。

Bioinformatics. 2021 May 17;37(7):921-928. doi: 10.1093/bioinformatics/btaa742.

Infinite switch simulated tempering in force (FISST).无限开关模拟力回火（FISST）。

J Chem Phys. 2020 Jun 28;152(24):244120. doi: 10.1063/5.0009280.

InfleCS: Clustering Free Energy Landscapes with Gaussian Mixtures.InfleCS：使用高斯混合模型对自由能景观进行无聚类分析。

J Chem Theory Comput. 2019 Dec 10;15(12):6752-6759. doi: 10.1021/acs.jctc.9b00454. Epub 2019 Nov 7.

A novel folding pathway of the villin headpiece subdomain HP35.一个 villin 头部亚结构域 HP35 的新型折叠途径。

Phys Chem Chem Phys. 2019 Aug 21;21(33):18219-18226. doi: 10.1039/c9cp01703h.

Dynamical coring of Markov state models.马尔可夫状态模型的动力学核化。

J Chem Phys. 2019 Mar 7;150(9):094111. doi: 10.1063/1.5081767.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。