快速构象聚类的广泛分子动力学模拟数据。

Fast conformational clustering of extensive molecular dynamics simulation data.

机构信息

Department of Chemistry, University of Konstanz, Konstanz, Germany.

Theory Department, Max Planck Institute for Polymer Research, Mainz, Germany.

出版信息

J Chem Phys. 2023 Apr 14;158(14):144109. doi: 10.1063/5.0142797.

DOI:10.1063/5.0142797

PMID:37061476

Abstract

We present an unsupervised data processing workflow that is specifically designed to obtain a fast conformational clustering of long molecular dynamics simulation trajectories. In this approach, we combine two dimensionality reduction algorithms (cc_analysis and encodermap) with a density-based spatial clustering algorithm (hierarchical density-based spatial clustering of applications with noise). The proposed scheme benefits from the strengths of the three algorithms while avoiding most of the drawbacks of the individual methods. Here, the cc_analysis algorithm is applied for the first time to molecular simulation data. The encodermap algorithm complements cc_analysis by providing an efficient way to process and assign large amounts of data to clusters. The main goal of the procedure is to maximize the number of assigned frames of a given trajectory while keeping a clear conformational identity of the clusters that are found. In practice, we achieve this by using an iterative clustering approach and a tunable root-mean-square-deviation-based criterion in the final cluster assignment. This allows us to find clusters of different densities and different degrees of structural identity. With the help of four protein systems, we illustrate the capability and performance of this clustering workflow: wild-type and thermostable mutant of the Trp-cage protein (TC5b and TC10b), NTL9, and Protein B. Each of these test systems poses their individual challenges to the scheme, which, in total, give a nice overview of the advantages and potential difficulties that can arise when using the proposed method.

摘要

我们提出了一种无监督的数据处理工作流程，专门用于获得长分子动力学模拟轨迹的快速构象聚类。在这种方法中，我们将两种降维算法（cc_analysis 和 encodermap）与基于密度的空间聚类算法（基于噪声的应用分层密度聚类）相结合。该方案结合了三种算法的优势，同时避免了单个方法的大部分缺点。这里，cc_analysis 算法首次应用于分子模拟数据。encodermap 算法通过提供一种高效的方法来处理和将大量数据分配给聚类，补充了 cc_analysis。该过程的主要目标是在保持找到的聚类明确构象身份的同时，最大化给定轨迹的分配帧数。在实践中，我们通过使用迭代聚类方法和最终聚类分配中的基于均方根偏差的可调标准来实现这一点。这允许我们找到具有不同密度和不同结构同一性程度的聚类。通过四个蛋白质系统，我们说明了这种聚类工作流程的能力和性能：色氨酸笼蛋白（TC5b 和 TC10b）、NTL9 和蛋白 B 的野生型和热稳定突变体。这些测试系统中的每一个都对该方案提出了各自的挑战，总的来说，该方案很好地概述了使用所提出的方法可能出现的优点和潜在困难。

相似文献

Fast conformational clustering of extensive molecular dynamics simulation data.快速构象聚类的广泛分子动力学模拟数据。

J Chem Phys. 2023 Apr 14;158(14):144109. doi: 10.1063/5.0142797.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms.聚类分子动力学轨迹：1. 表征不同聚类算法的性能

J Chem Theory Comput. 2007 Nov;3(6):2312-34. doi: 10.1021/ct700119m.

MDSCAN: RMSD-based HDBSCAN clustering of long molecular dynamics.MDSCAN：基于均方根偏差的长分子动力学HDBSCAN聚类

Bioinformatics. 2022 Nov 30;38(23):5191-5198. doi: 10.1093/bioinformatics/btac666.

Volume-scaled common nearest neighbor clustering algorithm with free-energy hierarchy.基于自由能层次的体积标度公共最近邻聚类算法。

J Chem Phys. 2021 Feb 28;154(8):084106. doi: 10.1063/5.0025797.

Cluster analysis of molecular simulation trajectories for systems where both conformation and orientation of the sampled states are important.对分子模拟轨迹进行聚类分析，这些轨迹对于采样状态的构象和取向都很重要的系统。

J Comput Chem. 2016 Aug 5;37(21):1973-82. doi: 10.1002/jcc.24416. Epub 2016 Jun 12.

EncoderMap: Dimensionality Reduction and Generation of Molecule Conformations.编码器映射：分子构象的降维和生成。

J Chem Theory Comput. 2019 Feb 12;15(2):1209-1215. doi: 10.1021/acs.jctc.8b00975. Epub 2019 Jan 25.

Energy-based clustering: Fast and robust clustering of data with known likelihood functions.基于能量的聚类：具有已知似然函数的数据的快速和鲁棒聚类。

J Chem Phys. 2023 Jul 14;159(2). doi: 10.1063/5.0148735.

Network visualization of conformational sampling during molecular dynamics simulation.分子动力学模拟中构象采样的网络可视化。

J Mol Graph Model. 2013 Nov;46:140-9. doi: 10.1016/j.jmgm.2013.10.003. Epub 2013 Oct 16.

Efficient Online Stream Clustering Based on Fast Peeling of Boundary Micro-Cluster.基于边界微簇快速剥离的高效在线流聚类

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5680-5693. doi: 10.1109/TNNLS.2024.3382033. Epub 2025 Feb 28.

引用本文的文献

EncoderMap III: A Dimensionality Reduction Package for Feature Exploration in Molecular Simulations.编码器映射III：用于分子模拟中特征探索的降维软件包。

J Chem Inf Model. 2025 Sep 8;65(17):9000-9008. doi: 10.1021/acs.jcim.5c00887. Epub 2025 Aug 20.

Multi-Component Synthesis of New Fluorinated-Pyrrolo[3,4-]pyridin-5-ones Containing the 4-Amino-7-chloroquinoline Moiety and In Vitro-In Silico Studies Against Human SARS-CoV-2.含4-氨基-7-氯喹啉部分的新型氟化吡咯并[3,4-]吡啶-5-酮的多组分合成及针对人类SARS-CoV-2的体外-计算机模拟研究

Int J Mol Sci. 2025 Aug 7;26(15):7651. doi: 10.3390/ijms26157651.

Artif Intell Chem. 2024 Dec;2(2). doi: 10.1016/j.aichem.2024.100077. Epub 2024 Aug 31.

Protocol for sequence clustering with PaSiMap in Jalview.使用Jalview中的PaSiMap进行序列聚类的协议。

STAR Protoc. 2025 Mar 21;6(1):103603. doi: 10.1016/j.xpro.2025.103603. Epub 2025 Feb 27.

Visualizing the Residue Interaction Landscape of Proteins by Temporal Network Embedding.通过时间网络嵌入可视化蛋白质的残基相互作用景观。

J Chem Theory Comput. 2023 May 23;19(10):2985-2995. doi: 10.1021/acs.jctc.2c01228. Epub 2023 Apr 25.

Generating a conformational landscape of ubiquitin chains at atomistic resolution by back-mapping based sampling.通过基于反向映射的采样生成原子分辨率下泛素链的构象景观。

Front Chem. 2023 Jan 10;10:1087963. doi: 10.3389/fchem.2022.1087963. eCollection 2022.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

快速构象聚类的广泛分子动力学模拟数据。

Fast conformational clustering of extensive molecular dynamics simulation data.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献