Suppr超能文献

通过扩展相似性度量来改进生物集合体的分析。

Improving the analysis of biological ensembles through extended similarity measures.

机构信息

Department of Chemistry, University of Florida, Gainesville, FL, 32611, USA.

Quantum Theory Project, University of Florida, Gainesville, FL, 32611, USA.

出版信息

Phys Chem Chem Phys. 2021 Dec 22;24(1):444-451. doi: 10.1039/d1cp04019g.

Abstract

We present new algorithms to classify structural ensembles of macromolecules based on the recently proposed extended similarity measures. Molecular dynamics provides a wealth of structural information on systems of biological interest. As computer power increases, we capture larger ensembles and larger conformational transitions between states. Typically, structural clustering provides the statistical mechanics treatment of the system to identify relevant biological states. The key advantage of our approach is that the newly introduced extended similarity indices reduce the computational complexity of assessing the similarity of a set of structures from O() to O(). Here we take advantage of this favorable cost to develop several highly efficient techniques, including a linear-scaling algorithm to determine the medoid of a set (which we effectively use to select the most representative structure of a cluster). Moreover, we use our extended similarity indices as a linkage criterion in a novel hierarchical agglomerative clustering algorithm. We apply these new metrics to analyze the ensembles of several systems of biological interest such as folding and binding of macromolecules (peptide, protein, DNA-protein). In particular, we design a new workflow that is capable of identifying the most important conformations contributing to the protein folding process. We show excellent performance in the resulting clusters (surpassing traditional linkage criteria), along with faster performance and an efficient cost-function to identify when to merge clusters.

摘要

我们提出了新的算法,基于最近提出的扩展相似性度量来对大分子的结构集合进行分类。分子动力学为生物感兴趣的系统提供了丰富的结构信息。随着计算机能力的提高,我们捕获了更大的集合和更大的状态之间的构象转变。通常,结构聚类为系统提供统计力学处理,以识别相关的生物状态。我们方法的关键优势在于,新引入的扩展相似性指数将评估一组结构的相似性的计算复杂度从 O()降低到 O()。在这里,我们利用这一有利的成本优势开发了几种高效技术,包括一种线性标度算法来确定集合的中位数(我们有效地利用它来选择聚类中最具代表性的结构)。此外,我们还将扩展相似性指数用作新的层次凝聚聚类算法中的链接标准。我们将这些新的度量标准应用于分析几个生物感兴趣的系统的集合,如大分子(肽、蛋白质、DNA-蛋白质)的折叠和结合。特别是,我们设计了一种新的工作流程,能够识别对蛋白质折叠过程有重要贡献的最主要构象。我们在得到的聚类中表现出优异的性能(超过传统的链接标准),同时具有更快的性能和有效的成本函数,可以确定何时合并聚类。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验