用于优化蛋白质结构域层次结构的贝叶斯采样器。

A Bayesian sampler for optimization of protein domain hierarchies.

作者信息

Neuwald Andrew F

机构信息

Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine , Baltimore, Maryland.

出版信息

J Comput Biol. 2014 Mar;21(3):269-86. doi: 10.1089/cmb.2013.0099. Epub 2014 Feb 4.

DOI:10.1089/cmb.2013.0099

PMID:24494927

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3948484/

Abstract

The process of identifying and modeling functionally divergent subgroups for a specific protein domain class and arranging these subgroups hierarchically has, thus far, largely been done via manual curation. How to accomplish this automatically and optimally is an unsolved statistical and algorithmic problem that is addressed here via Markov chain Monte Carlo sampling. Taking as input a (typically very large) multiple-sequence alignment, the sampler creates and optimizes a hierarchy by adding and deleting leaf nodes, by moving nodes and subtrees up and down the hierarchy, by inserting or deleting internal nodes, and by redefining the sequences and conserved patterns associated with each node. All such operations are based on a probability distribution that models the conserved and divergent patterns defining each subgroup. When we view these patterns as sequence determinants of protein function, each node or subtree in such a hierarchy corresponds to a subgroup of sequences with similar biological properties. The sampler can be applied either de novo or to an existing hierarchy. When applied to 60 protein domains from multiple starting points in this way, it converged on similar solutions with nearly identical log-likelihood ratio scores, suggesting that it typically finds the optimal peak in the posterior probability distribution. Similarities and differences between independently generated, nearly optimal hierarchies for a given domain help distinguish robust from statistically uncertain features. Thus, a future application of the sampler is to provide confidence measures for various features of a domain hierarchy.

摘要

到目前为止，针对特定蛋白质结构域类别识别功能不同的亚组并对这些亚组进行层次排列的过程，在很大程度上是通过人工整理完成的。如何自动且最优地完成这项工作是一个尚未解决的统计和算法问题，本文通过马尔可夫链蒙特卡罗采样来解决。采样器以一个（通常非常大的）多序列比对作为输入，通过添加和删除叶节点、在层次结构中上下移动节点和子树、插入或删除内部节点以及重新定义与每个节点相关的序列和保守模式，来创建和优化一个层次结构。所有这些操作都基于一个概率分布，该分布对定义每个亚组的保守和发散模式进行建模。当我们将这些模式视为蛋白质功能的序列决定因素时，这样一个层次结构中的每个节点或子树都对应于具有相似生物学特性的序列亚组。采样器既可以从头应用，也可以应用于现有的层次结构。当以这种方式从多个起始点应用于60个蛋白质结构域时，它收敛于具有几乎相同对数似然比分数的相似解决方案，这表明它通常在后验概率分布中找到最优峰值。给定结构域的独立生成的、近乎最优的层次结构之间的异同有助于区分稳健特征和统计上不确定的特征。因此，采样器未来的一个应用是为结构域层次结构的各种特征提供置信度度量。

相似文献

A Bayesian sampler for optimization of protein domain hierarchies.

J Comput Biol. 2014 Mar;21(3):269-86. doi: 10.1089/cmb.2013.0099. Epub 2014 Feb 4.

Protein domain hierarchy Gibbs sampling strategies.

Stat Appl Genet Mol Biol. 2014 Aug;13(4):497-517. doi: 10.1515/sagmb-2014-0008.

Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures.

BMC Bioinformatics. 2012 Jun 22;13:144. doi: 10.1186/1471-2105-13-144.

Evaluating, comparing, and interpreting protein domain hierarchies.

J Comput Biol. 2014 Apr;21(4):287-302. doi: 10.1089/cmb.2013.0098. Epub 2014 Feb 21.

Bayesian coestimation of phylogeny and sequence alignment.

BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.

Bayesian models and Markov chain Monte Carlo methods for protein motifs with the secondary characteristics.

J Comput Biol. 2005 Sep;12(7):952-70. doi: 10.1089/cmb.2005.12.952.

Bayesian mixture modeling using a hybrid sampler with application to protein subfamily identification.

Biostatistics. 2010 Jan;11(1):18-33. doi: 10.1093/biostatistics/kxp033. Epub 2009 Aug 20.

Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.

BMC Bioinformatics. 2004 Oct 25;5:157. doi: 10.1186/1471-2105-5-157.

Bayesian restoration of a hidden Markov chain with applications to DNA sequencing.

J Comput Biol. 1999 Summer;6(2):261-77. doi: 10.1089/cmb.1999.6.261.

Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

PLoS Comput Biol. 2016 Dec 21;12(12):e1005294. doi: 10.1371/journal.pcbi.1005294. eCollection 2016 Dec.

引用本文的文献

An atlas of bacterial serine-threonine kinases reveals functional diversity and key distinctions from eukaryotic kinases.

Sci Signal. 2025 May 6;18(885):eadt8686. doi: 10.1126/scisignal.adt8686.

Evolutionary and Functional Analysis of Caspase-8 and ASC Interactions to Drive Lytic Cell Death, PANoptosis.

Mol Biol Evol. 2025 Apr 30;42(5). doi: 10.1093/molbev/msaf096.

Informatic challenges and advances in illuminating the druggable proteome.

Drug Discov Today. 2024 Mar;29(3):103894. doi: 10.1016/j.drudis.2024.103894. Epub 2024 Jan 22.

Mechanistic and evolutionary insights into isoform-specific 'supercharging' in DCLK family kinases.

Elife. 2023 Oct 26;12:RP87958. doi: 10.7554/eLife.87958.

Structural and biochemical insight into a modular β-1,4-galactan synthase in plants.

Nat Plants. 2023 Mar;9(3):486-500. doi: 10.1038/s41477-023-01358-4. Epub 2023 Feb 27.

Computational tools and resources for pseudokinase research.

Methods Enzymol. 2022;667:403-426. doi: 10.1016/bs.mie.2022.03.040. Epub 2022 Apr 8.

SPARC: Structural properties associated with residue constraints.

Comput Struct Biotechnol J. 2022 Apr 7;20:1702-1715. doi: 10.1016/j.csbj.2022.04.005. eCollection 2022.

Identifying Function Determining Residues in Neuroimmune Semaphorin 4A.

Int J Mol Sci. 2022 Mar 11;23(6):3024. doi: 10.3390/ijms23063024.

Evolution of Functional Diversity in the Holozoan Tyrosine Kinome.

Mol Biol Evol. 2021 Dec 9;38(12):5625-5639. doi: 10.1093/molbev/msab272.

Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases.

Elife. 2020 Apr 1;9:e54532. doi: 10.7554/eLife.54532.

本文引用的文献

Evaluating, comparing, and interpreting protein domain hierarchies.

J Comput Biol. 2014 Apr;21(4):287-302. doi: 10.1089/cmb.2013.0098. Epub 2014 Feb 21.

Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures.

BMC Bioinformatics. 2012 Jun 22;13:144. doi: 10.1186/1471-2105-13-144.

Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.

Stat Appl Genet Mol Biol. 2011;10(1):Article 36. doi: 10.2202/1544-6115.1666. Epub 2011 Aug 4.

Genome-scale phylogenetic function annotation of large and diverse protein families.

Genome Res. 2011 Nov;21(11):1969-80. doi: 10.1101/gr.104687.109. Epub 2011 Jul 22.

CDD: a Conserved Domain Database for the functional annotation of proteins.

Nucleic Acids Res. 2011 Jan;39(Database issue):D225-9. doi: 10.1093/nar/gkq1189. Epub 2010 Nov 24.

Database resources of the National Center for Biotechnology Information.

Nucleic Acids Res. 2011 Jan;39(Database issue):D38-51. doi: 10.1093/nar/gkq1172. Epub 2010 Nov 21.

The Pfam protein families database.

Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. doi: 10.1093/nar/gkm960. Epub 2007 Nov 26.

Optimization by simulated annealing.

Science. 1983 May 13;220(4598):671-80. doi: 10.1126/science.220.4598.671.

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Bioinformatics. 2006 Jul 1;22(13):1658-9. doi: 10.1093/bioinformatics/btl158. Epub 2006 May 26.

Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.

BMC Bioinformatics. 2004 Oct 25;5:157. doi: 10.1186/1471-2105-5-157.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于优化蛋白质结构域层次结构的贝叶斯采样器。

A Bayesian sampler for optimization of protein domain hierarchies.

作者信息

Neuwald Andrew F

机构信息

Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine , Baltimore, Maryland.

出版信息

J Comput Biol. 2014 Mar;21(3):269-86. doi: 10.1089/cmb.2013.0099. Epub 2014 Feb 4.

DOI:10.1089/cmb.2013.0099

PMID:24494927

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3948484/

Abstract

摘要

用于优化蛋白质结构域层次结构的贝叶斯采样器。

A Bayesian sampler for optimization of protein domain hierarchies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于优化蛋白质结构域层次结构的贝叶斯采样器。

A Bayesian sampler for optimization of protein domain hierarchies.

作者信息

机构信息

出版信息