通过扩展相似性度量来改进生物集合体的分析。

Improving the analysis of biological ensembles through extended similarity measures.

机构信息

Department of Chemistry, University of Florida, Gainesville, FL, 32611, USA.

Quantum Theory Project, University of Florida, Gainesville, FL, 32611, USA.

出版信息

Phys Chem Chem Phys. 2021 Dec 22;24(1):444-451. doi: 10.1039/d1cp04019g.

DOI:10.1039/d1cp04019g

PMID:34897334

Abstract

We present new algorithms to classify structural ensembles of macromolecules based on the recently proposed extended similarity measures. Molecular dynamics provides a wealth of structural information on systems of biological interest. As computer power increases, we capture larger ensembles and larger conformational transitions between states. Typically, structural clustering provides the statistical mechanics treatment of the system to identify relevant biological states. The key advantage of our approach is that the newly introduced extended similarity indices reduce the computational complexity of assessing the similarity of a set of structures from O() to O(). Here we take advantage of this favorable cost to develop several highly efficient techniques, including a linear-scaling algorithm to determine the medoid of a set (which we effectively use to select the most representative structure of a cluster). Moreover, we use our extended similarity indices as a linkage criterion in a novel hierarchical agglomerative clustering algorithm. We apply these new metrics to analyze the ensembles of several systems of biological interest such as folding and binding of macromolecules (peptide, protein, DNA-protein). In particular, we design a new workflow that is capable of identifying the most important conformations contributing to the protein folding process. We show excellent performance in the resulting clusters (surpassing traditional linkage criteria), along with faster performance and an efficient cost-function to identify when to merge clusters.

摘要

我们提出了新的算法，基于最近提出的扩展相似性度量来对大分子的结构集合进行分类。分子动力学为生物感兴趣的系统提供了丰富的结构信息。随着计算机能力的提高，我们捕获了更大的集合和更大的状态之间的构象转变。通常，结构聚类为系统提供统计力学处理，以识别相关的生物状态。我们方法的关键优势在于，新引入的扩展相似性指数将评估一组结构的相似性的计算复杂度从 O()降低到 O()。在这里，我们利用这一有利的成本优势开发了几种高效技术，包括一种线性标度算法来确定集合的中位数（我们有效地利用它来选择聚类中最具代表性的结构）。此外，我们还将扩展相似性指数用作新的层次凝聚聚类算法中的链接标准。我们将这些新的度量标准应用于分析几个生物感兴趣的系统的集合，如大分子（肽、蛋白质、DNA-蛋白质）的折叠和结合。特别是，我们设计了一种新的工作流程，能够识别对蛋白质折叠过程有重要贡献的最主要构象。我们在得到的聚类中表现出优异的性能（超过传统的链接标准），同时具有更快的性能和有效的成本函数，可以确定何时合并聚类。

相似文献

Improving the analysis of biological ensembles through extended similarity measures.通过扩展相似性度量来改进生物集合体的分析。

Phys Chem Chem Phys. 2021 Dec 22;24(1):444-451. doi: 10.1039/d1cp04019g.

Protein Retrieval via Integrative Molecular Ensembles (PRIME) through Extended Similarity Indices.通过扩展相似性指数的综合分子组合（PRIME）进行蛋白质提取。

J Chem Theory Comput. 2024 Jul 23;20(14):6303-6315. doi: 10.1021/acs.jctc.4c00362. Epub 2024 Jul 8.

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象：化学与物理邂逅生物学（瑞士阿斯科纳，2012年6月10日至14日）

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

Molecular Dynamics Simulations and Diversity Selection by Extended Continuous Similarity Indices.分子动力学模拟与通过扩展连续相似性指数进行的多样性选择。

J Chem Inf Model. 2022 Jul 25;62(14):3415-3425. doi: 10.1021/acs.jcim.2c00433. Epub 2022 Jul 14.

Conformational and functional analysis of molecular dynamics trajectories by self-organising maps.基于自组织映射的分子动力学轨迹的构象和功能分析。

BMC Bioinformatics. 2011 May 14;12:158. doi: 10.1186/1471-2105-12-158.

J Cheminform. 2021 Apr 23;13(1):33. doi: 10.1186/s13321-021-00504-4.

Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.离散与连续蛋白质结构空间之间的交叉：对蛋白质结构自动分类及网络的见解。

PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.

Artificial neural networks for efficient clustering of conformational ensembles and their potential for medicinal chemistry.人工神经网络在构象系综高效聚类中的应用及其在药物化学中的潜力。

Curr Top Med Chem. 2013;13(5):642-51. doi: 10.2174/1568026611313050007.

Eurecon: Equidistant uniform rigid-body ensemble constructor.Eurecon：等距均匀刚体集合构造器。

J Mol Graph Model. 2018 Mar;80:313-319. doi: 10.1016/j.jmgm.2018.01.015. Epub 2018 Feb 2.

An Effective Approach for Clustering InhA Molecular Dynamics Trajectory Using Substrate-Binding Cavity Features.一种利用底物结合腔特征对InhA分子动力学轨迹进行聚类的有效方法。

PLoS One. 2015 Jul 28;10(7):e0133172. doi: 10.1371/journal.pone.0133172. eCollection 2015.

引用本文的文献

Undersampling techniques for non-linear chemical space visualization.用于非线性化学空间可视化的欠采样技术。

bioRxiv. 2025 Jul 7:2025.07.03.663077. doi: 10.1101/2025.07.03.663077.

Scaling -Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations.数百万帧的缩放方法：一种用于大规模分子动力学模拟的分层非自适应邻居搜索方法

bioRxiv. 2025 Jun 18:2025.06.15.659780. doi: 10.1101/2025.06.15.659780.

SHINE: Deterministic Many-to-Many Clustering of Molecular Pathways.SHINE：分子通路的确定性多对多聚类

J Chem Inf Model. 2025 May 26;65(10):4775-4782. doi: 10.1021/acs.jcim.5c00240. Epub 2025 May 6.

Extended Quality (eQual): Radial Threshold Clustering Based on -ary Similarity.扩展质量（eQual）：基于 - 元相似度的径向阈值聚类

J Chem Inf Model. 2025 May 26;65(10):5062-5070. doi: 10.1021/acs.jcim.4c02341. Epub 2025 May 1.

Hierarchical Extended Linkage Method (HELM)'s Deep Dive into Hybrid Clustering Strategies.分层扩展链接方法（HELM）对混合聚类策略的深入研究。

bioRxiv. 2025 Mar 10:2025.03.05.641742. doi: 10.1101/2025.03.05.641742.

Artif Intell Chem. 2024 Dec;2(2). doi: 10.1016/j.aichem.2024.100077. Epub 2024 Aug 31.

BitBIRCH: efficient clustering of large molecular libraries.BitBIRCH：大型分子文库的高效聚类

Digit Discov. 2025 Mar 13;4(4):1042-1051. doi: 10.1039/d5dd00030k. eCollection 2025 Apr 9.

SHINE: Deterministic Many-to-Many clustering of Molecular Pathways.SHINE：分子通路的确定性多对多聚类

bioRxiv. 2025 Feb 8:2025.02.07.636541. doi: 10.1101/2025.02.07.636541.

Extended Quality (eQual): Radial threshold clustering based on n-ary similarity.扩展质量（eQual）：基于n元相似度的径向阈值聚类

bioRxiv. 2024 Dec 5:2024.12.05.627001. doi: 10.1101/2024.12.05.627001.

Extended Activity Cliffs-Driven Approaches on Data Splitting for the Study of Bioactivity Machine Learning Predictions.用于生物活性机器学习预测研究的数据拆分的扩展活动悬崖驱动方法。

Mol Inform. 2025 Jan;44(1):e202400054. doi: 10.1002/minf.202400054. Epub 2024 Nov 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过扩展相似性度量来改进生物集合体的分析。

Improving the analysis of biological ensembles through extended similarity measures.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献