• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GFam:一个用于基因家族自动注释的平台。

GFam: a platform for automatic annotation of gene families.

机构信息

Department of Molecular, Cell and Developmental Biology, University of California at Los Angeles, Los Angeles, CA 90095, USA.

出版信息

Nucleic Acids Res. 2012 Oct;40(19):e152. doi: 10.1093/nar/gks631. Epub 2012 Jul 11.

DOI:10.1093/nar/gks631
PMID:22790981
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3479161/
Abstract

We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.

摘要

我们开发了 GFam,这是一个用于基因/蛋白质家族自动注释的平台。GFam 为基因组计划和模式生物资源提供了一个框架,用于构建基于域的家族,得出有意义的功能标签,并提供了一种无缝的方法,可在周期性的基因组更新中传播功能注释。GFam 是一种混合方法,它使用贪婪算法从 InterPro 注释中链接组件域,这些注释由其 12 个成员资源提供,然后对未注释的序列区域进行基于序列的连通组件分析,为每个序列推导出一致的域结构,并随后根据常见的结构生成家族。我们的集成方法将序列覆盖率提高了 7.2 个百分点,残基覆盖率提高了 14.6 个百分点,相对于 Arabidopsis 蛋白质组中 InterPro 内最佳单一成分数据库的覆盖率提高了 7.2 个百分点和残基覆盖率提高了 14.6 个百分点。GFam 的真正威力在于最大化不同 InterPro 数据源提供的注释,这些数据源为序列的不同区域提供特定于资源的覆盖范围。GFam 捕获更高的序列和残基覆盖率的能力可用于基因组注释、比较基因组学和功能研究。GFam 是一种通用软件,可用于任何蛋白质序列集合。该软件是开源的,可以从 http://www.paccanarolab.org/software/gfam/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/16f8/3479161/b85473d811a5/gks631f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/16f8/3479161/9b27ef019d98/gks631f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/16f8/3479161/b85473d811a5/gks631f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/16f8/3479161/9b27ef019d98/gks631f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/16f8/3479161/b85473d811a5/gks631f2p.jpg

相似文献

1
GFam: a platform for automatic annotation of gene families.GFam:一个用于基因家族自动注释的平台。
Nucleic Acids Res. 2012 Oct;40(19):e152. doi: 10.1093/nar/gks631. Epub 2012 Jul 11.
2
InterPro: the protein sequence classification resource in 2025.InterPro:2025年的蛋白质序列分类资源。
Nucleic Acids Res. 2025 Jan 6;53(D1):D444-D456. doi: 10.1093/nar/gkae1082.
3
Applications of InterPro in protein annotation and genome analysis.InterPro在蛋白质注释和基因组分析中的应用。
Brief Bioinform. 2002 Sep;3(3):285-95. doi: 10.1093/bib/3.3.285.
4
A domain-centric solution to functional genomics via dcGO Predictor.通过 dcGO Predictor 实现功能基因组学的以域为中心的解决方案。
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S9. doi: 10.1186/1471-2105-14-S3-S9. Epub 2013 Feb 28.
5
InterPro in 2019: improving coverage, classification and access to protein sequence annotations.InterPro 在 2019 年:提高蛋白质序列注释的覆盖范围、分类和访问。
Nucleic Acids Res. 2019 Jan 8;47(D1):D351-D360. doi: 10.1093/nar/gky1100.
6
The Pfam protein families database: embracing AI/ML.Pfam蛋白质家族数据库:拥抱人工智能/机器学习。
Nucleic Acids Res. 2025 Jan 6;53(D1):D523-D534. doi: 10.1093/nar/gkae997.
7
The InterPro protein families database: the classification resource after 15 years.InterPro蛋白质家族数据库:15年后的分类资源。
Nucleic Acids Res. 2015 Jan;43(Database issue):D213-21. doi: 10.1093/nar/gku1243. Epub 2014 Nov 26.
8
Identification and distribution of protein families in 120 completed genomes using Gene3D.利用Gene3D在120个已完成测序的基因组中鉴定蛋白质家族并分析其分布情况。
Proteins. 2005 May 15;59(3):603-15. doi: 10.1002/prot.20409.
9
SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny.超级家族——精密的比较基因组学、数据挖掘、可视化及系统发育学。
Nucleic Acids Res. 2009 Jan;37(Database issue):D380-6. doi: 10.1093/nar/gkn762. Epub 2008 Nov 26.
10
The InterPro protein families and domains database: 20 years on.The InterPro 蛋白质家族和结构域数据库:20 年的发展历程。
Nucleic Acids Res. 2021 Jan 8;49(D1):D344-D354. doi: 10.1093/nar/gkaa977.

引用本文的文献

1
PlantTribes2: Tools for comparative gene family analysis in plant genomics.植物部落2:植物基因组学中比较基因家族分析的工具
Front Plant Sci. 2023 Jan 31;13:1011199. doi: 10.3389/fpls.2022.1011199. eCollection 2022.

本文引用的文献

1
Identification of novel families and classification of the C2 domain superfamily elucidate the origin and evolution of membrane targeting activities in eukaryotes.鉴定新型家族和 C2 结构域超家族的分类,阐明了真核生物中膜靶向活性的起源和进化。
Gene. 2010 Dec 1;469(1-2):18-30. doi: 10.1016/j.gene.2010.08.006. Epub 2010 Aug 14.
2
SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale.SCPS:一种快速实现的基于谱方法的全基因组蛋白质家族检测。
BMC Bioinformatics. 2010 Mar 9;11:120. doi: 10.1186/1471-2105-11-120.
3
Comparative analyses reveal distinct sets of lineage-specific genes within Arabidopsis thaliana.
比较分析揭示了拟南芥中独特的谱系特异性基因集。
BMC Evol Biol. 2010 Feb 12;10:41. doi: 10.1186/1471-2148-10-41.
4
PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium.PANTHER 版本 7:改进了系统发育树、直系同源物,以及与基因本体论联盟的合作。
Nucleic Acids Res. 2010 Jan;38(Database issue):D204-10. doi: 10.1093/nar/gkp1019. Epub 2009 Dec 16.
5
The Pfam protein families database.Pfam 蛋白质家族数据库。
Nucleic Acids Res. 2010 Jan;38(Database issue):D211-22. doi: 10.1093/nar/gkp985. Epub 2009 Nov 17.
6
Gene3D: merging structure and function for a Thousand genomes.Gene3D:整合结构与功能的千基因组。
Nucleic Acids Res. 2010 Jan;38(Database issue):D296-300. doi: 10.1093/nar/gkp987. Epub 2009 Nov 11.
7
PROSITE, a protein domain database for functional characterization and annotation.PROSITE,一个用于功能特征描述和注释的蛋白质域数据库。
Nucleic Acids Res. 2010 Jan;38(Database issue):D161-6. doi: 10.1093/nar/gkp885. Epub 2009 Oct 25.
8
SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny.超级家族——精密的比较基因组学、数据挖掘、可视化及系统发育学。
Nucleic Acids Res. 2009 Jan;37(Database issue):D380-6. doi: 10.1093/nar/gkn762. Epub 2008 Nov 26.
9
SMART 6: recent updates and new developments.SMART 6:近期更新与新进展
Nucleic Acids Res. 2009 Jan;37(Database issue):D229-32. doi: 10.1093/nar/gkn808. Epub 2008 Oct 31.
10
InterPro: the integrative protein signature database.InterPro:综合蛋白质特征数据库。
Nucleic Acids Res. 2009 Jan;37(Database issue):D211-5. doi: 10.1093/nar/gkn785. Epub 2008 Oct 21.