全面审视整个蛋白质类别的多种差异，以获取有关潜在生化机制的统计线索。

Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.

作者信息

Neuwald Andrew F

机构信息

The University of Maryland, MD, USA.

出版信息

Stat Appl Genet Mol Biol. 2011;10(1):Article 36. doi: 10.2202/1544-6115.1666. Epub 2011 Aug 4.

DOI:10.2202/1544-6115.1666

PMID:22331370

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3176138/

Abstract

Certain residues have no known function yet are co-conserved across distantly related protein families and diverse organisms, suggesting that they perform critical roles associated with as-yet-unidentified molecular properties and mechanisms. This raises the question of how to obtain additional clues regarding these mysterious biochemical phenomena with a view to formulating experimentally testable hypotheses. One approach is to access the implicit biochemical information encoded within the vast amount of genomic sequence data now becoming available. Here, a new Gibbs sampling strategy is formulated and implemented that can partition hundreds of thousands of sequences within a major protein class into multiple, functionally-divergent categories based on those pattern residues that best discriminate between categories. The sampler precisely defines the partition and pattern for each category by explicitly modeling unrelated, non-functional and related-yet-divergent proteins that would otherwise obscure the analysis. To aid biological interpretation, auxiliary routines can characterize pattern residues within available crystal structures and identify those structures most likely to shed light on the roles of pattern residues. This approach can be used to define and annotate automatically subgroup-specific conserved domain profiles based on statistically-rigorous empirical criteria rather than on the subjective and labor-intensive process of manual curation. Incorporating such profiles into domain database search sites (such as the NCBI BLAST site) will provide biologists with previously inaccessible molecular information useful for hypothesis generation and experimental design. Analyses of P-loop GTPases and of AAA+ ATPases illustrate the sampler's ability to obtain such information.

摘要

某些残基尚无已知功能，但在远缘相关的蛋白质家族和多种生物体中共同保守，这表明它们执行与尚未确定的分子特性和机制相关的关键作用。这就提出了一个问题，即如何获得有关这些神秘生化现象的更多线索，以便形成可通过实验验证的假设。一种方法是利用现在可获得的大量基因组序列数据中编码的隐含生化信息。在此，制定并实施了一种新的吉布斯采样策略，该策略可以根据最能区分不同类别的模式残基，将主要蛋白质类别的数十万条序列划分为多个功能不同的类别。采样器通过明确建模不相关、无功能以及相关但有差异的蛋白质（否则会模糊分析），精确地定义了每个类别的划分和模式。为了辅助生物学解释，辅助程序可以对可用晶体结构中的模式残基进行表征，并识别那些最有可能揭示模式残基作用的结构。这种方法可用于基于统计严格的经验标准，而不是基于主观且费力的人工策划过程，自动定义和注释亚组特异性保守结构域概况。将这些概况纳入结构域数据库搜索网站（如NCBI BLAST网站），将为生物学家提供以前无法获得的分子信息，有助于生成假设和进行实验设计。对P环GTP酶和AAA + ATP酶的分析说明了采样器获取此类信息的能力。

相似文献

Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.

Stat Appl Genet Mol Biol. 2011;10(1):Article 36. doi: 10.2202/1544-6115.1666. Epub 2011 Aug 4.

Bayesian classification of residues associated with protein functional divergence: Arf and Arf-like GTPases.

Biol Direct. 2010 Dec 3;5:66. doi: 10.1186/1745-6150-5-66.

Protein domain hierarchy Gibbs sampling strategies.

Stat Appl Genet Mol Biol. 2014 Aug;13(4):497-517. doi: 10.1515/sagmb-2014-0008.

Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures.

BMC Bioinformatics. 2012 Jun 22;13:144. doi: 10.1186/1471-2105-13-144.

Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.

Bioinformatics. 2009 Aug 1;25(15):1869-75. doi: 10.1093/bioinformatics/btp342. Epub 2009 Jun 8.

Blast sampling for structural and functional analyses.

BMC Bioinformatics. 2007 Feb 23;8:62. doi: 10.1186/1471-2105-8-62.

Sequence patterns derived from the automated prediction of functional residues in structurally-aligned homologous protein families.

Bioinformatics. 2004 Oct 12;20(15):2380-9. doi: 10.1093/bioinformatics/bth255. Epub 2004 Apr 8.

Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.

BMC Bioinformatics. 2004 Oct 25;5:157. doi: 10.1186/1471-2105-5-157.

Functional classification of CATH superfamilies: a domain-based approach for protein function annotation.

Bioinformatics. 2015 Nov 1;31(21):3460-7. doi: 10.1093/bioinformatics/btv398. Epub 2015 Jul 2.

引用本文的文献

Computational tools and resources for pseudokinase research.

Methods Enzymol. 2022;667:403-426. doi: 10.1016/bs.mie.2022.03.040. Epub 2022 Apr 8.

Evolution of Functional Diversity in the Holozoan Tyrosine Kinome.

Mol Biol Evol. 2021 Dec 9;38(12):5625-5639. doi: 10.1093/molbev/msab272.

Lipid-targeting pleckstrin homology domain turns its autoinhibitory face toward the TEC kinases.

Proc Natl Acad Sci U S A. 2019 Oct 22;116(43):21539-21544. doi: 10.1073/pnas.1907566116. Epub 2019 Oct 7.

Tracing the origin and evolution of pseudokinases across the tree of life.

Sci Signal. 2019 Apr 23;12(578):eaav3810. doi: 10.1126/scisignal.aav3810.

Statistical investigations of protein residue direct couplings.

PLoS Comput Biol. 2018 Dec 31;14(12):e1006237. doi: 10.1371/journal.pcbi.1006237. eCollection 2018 Dec.

Initial Cluster Analysis.

J Comput Biol. 2018 Feb;25(2):121-129. doi: 10.1089/cmb.2017.0050. Epub 2017 Aug 3.

Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

PLoS Comput Biol. 2016 Dec 21;12(12):e1005294. doi: 10.1371/journal.pcbi.1005294. eCollection 2016 Dec.

Identification and classification of small molecule kinases: insights into substrate recognition and specificity.

BMC Evol Biol. 2016 Jan 6;16:7. doi: 10.1186/s12862-015-0576-x.

Co-conserved MAPK features couple D-domain docking groove to distal allosteric sites via the C-terminal flanking tail.

PLoS One. 2015 Mar 23;10(3):e0119636. doi: 10.1371/journal.pone.0119636. eCollection 2015.

Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently.

Chem Soc Rev. 2015 Mar 7;44(5):1172-239. doi: 10.1039/c4cs00351a.

本文引用的文献

Bayesian classification of residues associated with protein functional divergence: Arf and Arf-like GTPases.

Biol Direct. 2010 Dec 3;5:66. doi: 10.1186/1745-6150-5-66.

Helicases: an overview.

Methods Mol Biol. 2010;587:1-12. doi: 10.1007/978-1-60327-355-8_1.

Exploring protein fitness landscapes by directed evolution.

Nat Rev Mol Cell Biol. 2009 Dec;10(12):866-76. doi: 10.1038/nrm2805.

Ensemble approach to predict specificity determinants: benchmarking and validation.

BMC Bioinformatics. 2009 Jul 2;10:207. doi: 10.1186/1471-2105-10-207.

Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.

Bioinformatics. 2009 Aug 1;25(15):1869-75. doi: 10.1093/bioinformatics/btp342. Epub 2009 Jun 8.

The charge-dipole pocket: a defining feature of signaling pathway GTPase on/off switches.

J Mol Biol. 2009 Jul 3;390(1):142-53. doi: 10.1016/j.jmb.2009.05.001. Epub 2009 May 7.

The glycine brace: a component of Rab, Rho, and Ran GTPases associated with hinge regions of guanine- and phosphate-binding loops.

BMC Struct Biol. 2009 Mar 5;9:11. doi: 10.1186/1472-6807-9-11.

CDD: specific functional annotation with the Conserved Domain Database.

Nucleic Acids Res. 2009 Jan;37(Database issue):D205-10. doi: 10.1093/nar/gkn845. Epub 2008 Nov 4.

Characterization and prediction of residues determining protein functional specificity.

Bioinformatics. 2008 Jul 1;24(13):1473-80. doi: 10.1093/bioinformatics/btn214. Epub 2008 May 1.

The Pfam protein families database.

Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. doi: 10.1093/nar/gkm960. Epub 2007 Nov 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

全面审视整个蛋白质类别的多种差异，以获取有关潜在生化机制的统计线索。

Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献