基因组学中的注释转移：测量多结构域蛋白质的功能差异

Annotation transfer for genomics: measuring functional divergence in multi-domain proteins.

作者信息

Hegyi H, Gerstein M

机构信息

Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.

出版信息

Genome Res. 2001 Oct;11(10):1632-40. doi: 10.1101/gr.183801.

DOI:10.1101/gr.183801

PMID:11591640

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC311165/

Abstract

Annotation transfer is a principal process in genome annotation. It involves "transferring" structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is important that this process be robust and statistically well-characterized, especially with regard to how it depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, present more complex issues in functional conservation. Here we present a large-scale survey of annotation transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have significantly less functional conservation than single-domain ones, except when they share the exact same combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the probability of their sharing the same function increases to 80% in the case of complete coverage along the full length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these were associated with the same function in both categories of proteins. We also investigated the degree to which function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence similarity between them, finding that functional divergence at a given amount of sequence similarity is always about two-fold greater for pairs of multi-domain proteins (sharing similarity over a single domain) in comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further information is available at http://partslist.org/func or http://bioinfo.mbb.yale.edu/partslist/func.

摘要

注释转移是基因组注释中的一个主要过程。它涉及将结构和功能注释从序列相似的经过实验表征的蛋白质“转移”到新完成基因组中未表征的开放阅读框（ORF）。为防止基因组注释中的错误，该过程稳健且在统计学上有良好表征非常重要，特别是在其如何依赖于序列相似程度方面。此前，我们和其他人已分析了单结构域蛋白质中的注释转移。构成真核生物基因组中大部分ORF的多结构域蛋白质，在功能保守方面存在更复杂的问题。在此，我们对这些蛋白质中的注释转移进行了大规模调查，使用SCOP超家族来定义结构域折叠，并基于SWISS-PROT关键词的词库来定义功能类别。我们的调查显示，多结构域蛋白质相比单结构域蛋白质具有显著更少的功能保守性，除非它们共享完全相同的结构域折叠组合。特别是，我们发现对于多结构域蛋白质，对于共享一个结构超家族的蛋白质对，近似功能只有35%的确定性能够准确转移。相比之下，共享相同结构超家族的单结构域蛋白质对的这一值为67%。另一方面，如果两个多结构域蛋白质包含两个结构超家族的相同组合，在两个蛋白质全长完全覆盖的情况下，它们共享相同功能的概率增加到80%，此值进一步增加到>90%。此外，我们发现当前455个结构超家族中只有70个同时存在于单结构域和多结构域蛋白质中，其中只有14个在这两类蛋白质中与相同功能相关。我们还研究了多结构域蛋白质对之间功能转移的程度与它们之间序列相似程度的关系，发现对于多结构域蛋白质对（在单个结构域上共享相似性），在给定序列相似量下的功能差异总是比单结构域蛋白质对大约两倍，尽管这种关系的整体形状非常相似。更多信息可在http://partslist.org/func或http://bioinfo.mbb.yale.edu/partslist/func获取。

相似文献

Annotation transfer for genomics: measuring functional divergence in multi-domain proteins.

Genome Res. 2001 Oct;11(10):1632-40. doi: 10.1101/gr.183801.

Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.

J Mol Biol. 2000 Mar 17;297(1):233-49. doi: 10.1006/jmbi.2000.3550.

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information.

Nucleic Acids Res. 2001 Apr 15;29(8):1750-64. doi: 10.1093/nar/29.8.1750.

Structural genomics analysis: characteristics of atypical, common, and horizontally transferred folds.

Proteins. 2002 May 1;47(2):126-41. doi: 10.1002/prot.10078.

A domain-centric solution to functional genomics via dcGO Predictor.

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S9. doi: 10.1186/1471-2105-14-S3-S9. Epub 2013 Feb 28.

The relationship between protein structure and function: a comprehensive survey with application to the yeast genome.

J Mol Biol. 1999 Apr 23;288(1):147-64. doi: 10.1006/jmbi.1999.2661.

Evolution of function in protein superfamilies, from a structural perspective.

J Mol Biol. 2001 Apr 6;307(4):1113-43. doi: 10.1006/jmbi.2001.4513.

Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies.

J Struct Funct Genomics. 2009 Apr;10(2):107-25. doi: 10.1007/s10969-008-9056-5. Epub 2009 Feb 14.

SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes.

Nucleic Acids Res. 2002 Jan 1;30(1):289-93. doi: 10.1093/nar/30.1.289.

引用本文的文献

Genome-wide analysis of the RING-H2 E3 ubiquitin ligase SlATL family in tomato.

Sci Rep. 2025 Jul 31;15(1):27987. doi: 10.1038/s41598-025-13292-w.

AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data.

Front Genet. 2022 Nov 22;13:1020100. doi: 10.3389/fgene.2022.1020100. eCollection 2022.

Biogenesis, conservation, and function of miRNA in liverworts.

J Exp Bot. 2022 Jul 16;73(13):4528-4545. doi: 10.1093/jxb/erac098.

Pathway-specific protein domains are predictive for human diseases.

PLoS Comput Biol. 2019 May 10;15(5):e1007052. doi: 10.1371/journal.pcbi.1007052. eCollection 2019 May.

SpidermiR: An R/Bioconductor Package for Integrative Analysis with miRNA Data.

Int J Mol Sci. 2017 Jan 27;18(2):274. doi: 10.3390/ijms18020274.

Structural and Functional Characterization of a Ruminal β-Glycosidase Defines a Novel Subfamily of Glycoside Hydrolase Family 3 with Permuted Domain Topology.

J Biol Chem. 2016 Nov 11;291(46):24200-24214. doi: 10.1074/jbc.M116.747527. Epub 2016 Sep 27.

Genome-wide survey of two-component signal transduction systems in the plant growth-promoting bacterium Azospirillum.

BMC Genomics. 2015 Oct 22;16:833. doi: 10.1186/s12864-015-1962-x.

Scrutinizing the immune defence inventory of Camponotus floridanus applying total transcriptome sequencing.

BMC Genomics. 2015 Jul 22;16(1):540. doi: 10.1186/s12864-015-1748-1.

DBC1/CCAR2 and CCAR1 Are Largely Disordered Proteins that Have Evolved from One Common Ancestor.

Biomed Res Int. 2014;2014:418458. doi: 10.1155/2014/418458. Epub 2014 Dec 11.

Insights into the origin and evolution of the plant hormone signaling machinery.

Plant Physiol. 2015 Mar;167(3):872-86. doi: 10.1104/pp.114.247403. Epub 2015 Jan 5.

本文引用的文献

PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information.

Nucleic Acids Res. 2001 Apr 15;29(8):1750-64. doi: 10.1093/nar/29.8.1750.

Evolution of function in protein superfamilies, from a structural perspective.

J Mol Biol. 2001 Apr 6;307(4):1113-43. doi: 10.1006/jmbi.2001.4513.

Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome.

Nucleic Acids Res. 2001 Feb 1;29(3):818-30. doi: 10.1093/nar/29.3.818.

From structure to function: approaches and limitations.

Nat Struct Biol. 2000 Nov;7 Suppl:991-4. doi: 10.1038/80784.

A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome.

J Mol Biol. 2000 Aug 25;301(4):1059-75. doi: 10.1006/jmbi.2000.3968.

Practical limits of function prediction.

Proteins. 2000 Oct 1;41(1):98-107.

Sensitive sequence comparison as protein function predictor.

Pac Symp Biocomput. 2000:42-53. doi: 10.1142/9789814447331_0005.

Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels.

Genome Res. 2000 Jun;10(6):808-18. doi: 10.1101/gr.10.6.808.

Predicting protein function from structure: unique structural features of proteases.

Proc Natl Acad Sci U S A. 2000 Apr 11;97(8):3954-8. doi: 10.1073/pnas.070548997.

Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.

J Mol Biol. 2000 Mar 17;297(1):233-49. doi: 10.1006/jmbi.2000.3550.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基因组学中的注释转移：测量多结构域蛋白质的功能差异

Annotation transfer for genomics: measuring functional divergence in multi-domain proteins.

作者信息

Hegyi H, Gerstein M

机构信息

Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.

出版信息

Genome Res. 2001 Oct;11(10):1632-40. doi: 10.1101/gr.183801.

DOI:10.1101/gr.183801

PMID:11591640

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC311165/

Abstract

摘要

基因组学中的注释转移：测量多结构域蛋白质的功能差异

Annotation transfer for genomics: measuring functional divergence in multi-domain proteins.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基因组学中的注释转移：测量多结构域蛋白质的功能差异

Annotation transfer for genomics: measuring functional divergence in multi-domain proteins.

作者信息

机构信息

出版信息