• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用Gene3D在120个已完成测序的基因组中鉴定蛋白质家族并分析其分布情况。

Identification and distribution of protein families in 120 completed genomes using Gene3D.

作者信息

Lee David, Grant Alastair, Marsden Russell L, Orengo Christine

机构信息

Biomolecular Structure and Modelling Group, Department of Biochemistry, University College London, Gower Street, London.

出版信息

Proteins. 2005 May 15;59(3):603-15. doi: 10.1002/prot.20409.

DOI:10.1002/prot.20409
PMID:15768405
Abstract

Using a new protocol, PFscape, we undertake a systematic identification of protein families and domain architectures in 120 complete genomes. PFscape clusters sequences into protein families using a Markov clustering algorithm (Enright et al., Nucleic Acids Res 2002;30:1575-1584) followed by complete linkage clustering according to sequence identity. Within each protein family, domains are recognized using a library of hidden Markov models comprising CATH structural and Pfam functional domains. Domain architectures are then determined using DomainFinder (Pearl et al., Protein Sci 2002;11:233-244) and the protein family and domain architecture data are amalgamated in the Gene3D database (Buchan et al., Genome Res 2002;12:503-514). Using Gene3D, we have investigated protein sequence space, the extent of structural annotation, and the distribution of different domain architectures in completed genomes from all kingdoms of life. As with earlier studies by other researchers, the distribution of domain families shows power-law behavior such that the largest 2,000 domain families can be mapped to approximately 70% of nonsingleton genome sequences; the remaining sequences are assigned to much smaller families. While approximately 50% of domain annotations within a genome are assigned to 219 universal domain families, a much smaller proportion (< 10%) of protein sequences are assigned to universal protein families. This supports the mosaic theory of evolution whereby domain duplication followed by domain shuffling gives rise to novel domain architectures that can expand the protein functional repertoire of an organism. Functional data (e.g. COG/KEGG/GO) integrated within Gene3D result in a comprehensive resource that is currently being used in structure genomics initiatives and can be accessed via http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/.

摘要

我们使用一种新的协议PFscape,对120个完整基因组中的蛋白质家族和结构域架构进行了系统鉴定。PFscape使用马尔可夫聚类算法(Enright等人,《核酸研究》2002年;30:1575 - 1584)将序列聚类成蛋白质家族,随后根据序列同一性进行完全连锁聚类。在每个蛋白质家族中,使用包含CATH结构域和Pfam功能结构域的隐马尔可夫模型库识别结构域。然后使用DomainFinder(Pearl等人,《蛋白质科学》2002年;11:233 - 244)确定结构域架构,并将蛋白质家族和结构域架构数据合并到Gene3D数据库(Buchan等人,《基因组研究》2002年;12:503 - 514)中。利用Gene3D,我们研究了蛋白质序列空间、结构注释的范围以及来自生命所有王国的完整基因组中不同结构域架构的分布。与其他研究人员早期的研究一样,结构域家族的分布呈现幂律行为,即最大的2000个结构域家族可以映射到大约70%的非单拷贝基因组序列;其余序列则被分配到小得多的家族中。虽然基因组内大约50%的结构域注释被分配到219个通用结构域家族,但只有小得多的比例(<10%)的蛋白质序列被分配到通用蛋白质家族。这支持了进化的镶嵌理论,即结构域复制后接着结构域改组产生新的结构域架构,从而可以扩展生物体的蛋白质功能库。整合在Gene3D中的功能数据(如COG/KEGG/GO)形成了一个全面的资源,目前正用于结构基因组学计划,可通过http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/访问。

相似文献

1
Identification and distribution of protein families in 120 completed genomes using Gene3D.利用Gene3D在120个已完成测序的基因组中鉴定蛋白质家族并分析其分布情况。
Proteins. 2005 May 15;59(3):603-15. doi: 10.1002/prot.20409.
2
The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis.CATH结构域数据库以及相关资源Gene3D和DHS为基因组分析提供了全面的结构域家族信息。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D247-51. doi: 10.1093/nar/gki024.
3
Domain-based and family-specific sequence identity thresholds increase the levels of reliable protein function transfer.基于结构域和家族特异性的序列同一性阈值提高了可靠蛋白质功能转移的水平。
J Mol Biol. 2009 Mar 27;387(2):416-30. doi: 10.1016/j.jmb.2008.12.045. Epub 2008 Dec 25.
4
Evolution of function in protein superfamilies, from a structural perspective.从结构角度看蛋白质超家族中功能的演变。
J Mol Biol. 2001 Apr 6;307(4):1113-43. doi: 10.1006/jmbi.2001.4513.
5
Gene3D: merging structure and function for a Thousand genomes.Gene3D:整合结构与功能的千基因组。
Nucleic Acids Res. 2010 Jan;38(Database issue):D296-300. doi: 10.1093/nar/gkp987. Epub 2009 Nov 11.
6
Domain combinations in archaeal, eubacterial and eukaryotic proteomes.古菌、真细菌和真核生物蛋白质组中的结构域组合
J Mol Biol. 2001 Jul 6;310(2):311-25. doi: 10.1006/jmbi.2001.4776.
7
Automatic annotation of protein function based on family identification.基于家族识别的蛋白质功能自动注释。
Proteins. 2003 Nov 15;53(3):683-92. doi: 10.1002/prot.10449.
8
Accurate domain identification with structure-anchored hidden Markov models, saHMMs.基于结构锚定隐马尔可夫模型(saHMMs)的精确领域识别。
Proteins. 2009 Aug 1;76(2):343-52. doi: 10.1002/prot.22349.
9
A proteome-wide analysis of domain architectures of prokaryotic single-spanning transmembrane proteins.原核单跨膜蛋白结构域架构的全蛋白质组分析。
Comput Biol Chem. 2005 Oct;29(5):379-87. doi: 10.1016/j.compbiolchem.2005.08.004. Epub 2005 Oct 6.
10
EyeSite: a semi-automated database of protein families in the eye.EyeSite:眼部蛋白质家族半自动数据库。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D148-52. doi: 10.1093/nar/gkh090.

引用本文的文献

1
Site-Directed Mutagenesis Mediated by Molecular Modeling and Docking and Its Effect on the Protein-Protein Interactions of the bHLH Transcription Factors SPATULA, HECATE1, and INDEHISCENT.通过分子建模和对接介导的定点诱变及其对bHLH转录因子SPATULA、HECATE1和INDEHISCENT蛋白质-蛋白质相互作用的影响
Plants (Basel). 2025 Jun 8;14(12):1756. doi: 10.3390/plants14121756.
2
Improving enzyme functional annotation by integrating in vitro and in silico approaches: The example of histidinol phosphate phosphatases.通过整合体外和计算方法来改善酶功能注释:以肌醇磷酸磷酸酶为例。
Protein Sci. 2024 Feb;33(2):e4899. doi: 10.1002/pro.4899.
3
Size distribution of function-based human gene sets and the split-merge model.
基于功能的人类基因集的大小分布与分裂-合并模型。
R Soc Open Sci. 2016 Aug 3;3(8):160275. doi: 10.1098/rsos.160275. eCollection 2016 Aug.
4
The impact of structural genomics: the first quindecennial.结构基因组学的影响:首个十五年。
J Struct Funct Genomics. 2016 Mar;17(1):1-16. doi: 10.1007/s10969-016-9201-5. Epub 2016 Mar 2.
5
The history of the CATH structural classification of protein domains.蛋白质结构域的CATH结构分类历史。
Biochimie. 2015 Dec;119:209-17. doi: 10.1016/j.biochi.2015.08.004. Epub 2015 Aug 4.
6
UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.UniRef聚类:一种用于改进序列相似性搜索的全面且可扩展的替代方法。
Bioinformatics. 2015 Mar 15;31(6):926-32. doi: 10.1093/bioinformatics/btu739. Epub 2014 Nov 13.
7
Target selection for structural genomics based on combining fold recognition and crystallisation prediction methods: application to the human proteome.基于折叠识别与结晶预测方法相结合的结构基因组学靶点选择:应用于人类蛋白质组
J Struct Funct Genomics. 2012 Mar;13(1):37-46. doi: 10.1007/s10969-012-9130-x. Epub 2012 Feb 22.
8
The evolutionary origin of orphan genes.孤儿基因的进化起源。
Nat Rev Genet. 2011 Aug 31;12(10):692-702. doi: 10.1038/nrg3053.
9
Internal organization of large protein families: relationship between the sequence, structure, and function-based clustering.大型蛋白质家族的内部组织:基于序列、结构和功能聚类的关系。
Proteins. 2011 Aug;79(8):2389-402. doi: 10.1002/prot.23049. Epub 2011 May 31.
10
Scaling properties of protein family phylogenies.蛋白质家族系统发生的标度性质。
BMC Evol Biol. 2011 Jun 6;11:155. doi: 10.1186/1471-2148-11-155.