用于生物分子统计平均值数值估计的基本聚类算法分析。

Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

作者信息

Anandakrishnan Ramu, Onufriev Alexey

机构信息

Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, USA.

出版信息

J Comput Biol. 2008 Mar;15(2):165-84. doi: 10.1089/cmb.2007.0144.

DOI:10.1089/cmb.2007.0144

PMID:18312148

Abstract

In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.

摘要

在统计力学中，粒子物理系统的平衡性质可计算为该系统可及微观状态的统计平均值。一般来说，这些计算在计算上难以处理，因为它们涉及对指数级大量微观状态的求和。聚类算法是用于对这些求和进行数值近似的方法之一。最基本的聚类算法首先将系统细分为一组较小的子集（簇）。然后，精确处理每个簇内粒子之间的相互作用，而忽略不同簇之间的所有相互作用。这些较小的簇具有少得多的微观状态，使得对这些微观状态的求和变得易于处理。这些算法先前已用于生物分子计算，但在此背景下仍相对未被探索。本文给出了对先前应用于生物分子静电学背景下的两种最基本聚类算法的误差和计算复杂度的理论分析。我们为通过这些聚类算法计算的粒子平衡态导出了一个紧密的、计算成本低的误差界。对于一些实际应用，可能更重要的是均方根误差，它可能显著低于误差界。我们表明误差界和均方根误差之间存在很强的经验关系，这表明误差界可作为一种计算成本低的度量，用于预测聚类算法在实际应用中的准确性。给出了此类应用——计算蛋白质中可电离氨基酸的平均电荷——的误差分析示例，表明聚类算法在实际应用中可以足够准确。

相似文献

Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

J Comput Biol. 2008 Mar;15(2):165-84. doi: 10.1089/cmb.2007.0144.

A simple clustering algorithm can be accurate enough for use in calculations of pKs in macromolecules.

Proteins. 2006 Jun 1;63(4):928-38. doi: 10.1002/prot.20922.

The zero-multipole summation method for estimating electrostatic interactions in molecular dynamics: analysis of the accuracy and application to liquid systems.

J Chem Phys. 2014 May 21;140(19):194307. doi: 10.1063/1.4875693.

A robust information clustering algorithm.

Neural Comput. 2005 Dec;17(12):2672-98. doi: 10.1162/089976605774320548.

Metric for measuring the effectiveness of clustering of DNA microarray expression.

BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-7-S2-S5.

Analytical electrostatics for biomolecules: beyond the generalized Born approximation.

J Chem Phys. 2006 Mar 28;124(12):124902. doi: 10.1063/1.2177251.

Constructing irregular surfaces to enclose macromolecular complexes for mesoscale modeling using the discrete surface charge optimization (DISCO) algorithm.

J Comput Chem. 2003 Dec;24(16):2063-74. doi: 10.1002/jcc.10337.

Order N algorithm for computation of electrostatic interactions in biomolecular systems.

Proc Natl Acad Sci U S A. 2006 Dec 19;103(51):19314-9. doi: 10.1073/pnas.0605166103. Epub 2006 Dec 5.

EEG Microstate Sequences From Different Clustering Algorithms Are Information-Theoretically Invariant.

Front Comput Neurosci. 2018 Aug 27;12:70. doi: 10.3389/fncom.2018.00070. eCollection 2018.

A fast clustering algorithm for data with a few labeled instances.

Comput Intell Neurosci. 2015;2015:196098. doi: 10.1155/2015/196098. Epub 2015 Mar 11.

引用本文的文献

Coumarin-Chalcone Hybrids as Inhibitors of MAO-B: Biological Activity and In Silico Studies.

Molecules. 2021 Apr 22;26(9):2430. doi: 10.3390/molecules26092430.

Functional roles of T3.37 and S5.46 in the activation mechanism of the dopamine D1 receptor.

J Mol Model. 2017 Apr;23(4):142. doi: 10.1007/s00894-017-3313-0. Epub 2017 Mar 31.

Characterizing a histidine switch controlling pH-dependent conformational changes of the influenza virus hemagglutinin.

Biophys J. 2013 Aug 20;105(4):993-1003. doi: 10.1016/j.bpj.2013.06.047.

DNA cytosine methylation: structural and thermodynamic characterization of the epigenetic marking mechanism.

Biochemistry. 2013 Apr 23;52(16):2828-38. doi: 10.1021/bi400163k. Epub 2013 Apr 12.

Visualizing functional motions of membrane transporters with molecular dynamics simulations.

Biochemistry. 2013 Jan 29;52(4):569-87. doi: 10.1021/bi301086x. Epub 2013 Jan 17.

A partition function approximation using elementary symmetric functions.

PLoS One. 2012;7(12):e51352. doi: 10.1371/journal.pone.0051352. Epub 2012 Dec 12.

A viral, transporter associated with antigen processing (TAP)-independent, high affinity ligand with alternative interactions endogenously presented by the nonclassical human leukocyte antigen E class I molecule.

J Biol Chem. 2012 Oct 12;287(42):34895-34903. doi: 10.1074/jbc.M112.362293. Epub 2012 Aug 27.

Preferred WMSA catalytic mechanism of the nucleotidyl transfer reaction in human DNA polymerase κ elucidates error-free bypass of a bulky DNA lesion.

Nucleic Acids Res. 2012 Oct;40(18):9193-205. doi: 10.1093/nar/gks653. Epub 2012 Jul 5.

H++ 3.0: automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations.

Nucleic Acids Res. 2012 Jul;40(Web Server issue):W537-41. doi: 10.1093/nar/gks375. Epub 2012 May 8.

Nature of allosteric inhibition in glutamate racemase: discovery and characterization of a cryptic inhibitory pocket using atomistic MD simulations and pKa calculations.

J Phys Chem B. 2011 Apr 7;115(13):3416-24. doi: 10.1021/jp201037t. Epub 2011 Mar 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于生物分子统计平均值数值估计的基本聚类算法分析。

Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献