• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GeMMA:预测蛋白质结构域超家族内的功能亚家族分类。

GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains.

机构信息

University College London - Structural and Molecular Biology, London, UK.

出版信息

Nucleic Acids Res. 2010 Jan;38(3):720-37. doi: 10.1093/nar/gkp1049. Epub 2009 Nov 18.

DOI:10.1093/nar/gkp1049
PMID:19923231
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2817468/
Abstract

GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be comparable to the established high-performance method SCI-PHY. GeMMA follows an agglomerative clustering protocol that uses existing software for sensitive and accurate multiple sequence alignment and profile-profile comparison. The produced subfamilies are shown to be equivalent in quality whether whole protein sequences are used or just the sequences of component predicted structural domains. A faster, heuristic version of GeMMA that also uses distributed computing is shown to maintain the performance levels of the original implementation. The use of GeMMA to increase the functional annotation coverage of functionally diverse Pfam families is demonstrated. It is further shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics.

摘要

GeMMA(基因组建模和模型注释)是一种在蛋白质序列家族和超家族中自动进行功能亚家族分类的新方法。GeMMA 的一个主要优势是,它能够对具有成千上万成员的非常大且多样化的超家族进行子类划分,而无需进行初始的多重序列比对。其性能被证明可与既定的高性能方法 SCI-PHY 相媲美。GeMMA 采用聚合聚类协议,该协议使用现有软件进行敏感和准确的多重序列比对和轮廓-轮廓比较。无论使用完整的蛋白质序列还是仅使用组成预测结构域的序列,所产生的亚家族在质量上都是等效的。一种更快、基于启发式的 GeMMA 版本也使用分布式计算,其性能水平与原始实现保持一致。展示了如何使用 GeMMA 来提高功能多样的 Pfam 家族的功能注释覆盖率。进一步展示了 GeMMA 聚类如何帮助预测在结构基因组学中,实验确定蛋白质结构域结构对比较蛋白质建模覆盖范围的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/28a809920b03/gkp1049f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/975d994d85b1/gkp1049f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/4cef0976f0b5/gkp1049f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/687a41659d03/gkp1049f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/067f6962c57a/gkp1049f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/71a398bb77ae/gkp1049f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/43c8e87caab4/gkp1049f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/8f0e99618fca/gkp1049f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/be8ce5efb8ab/gkp1049f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/b754fbe7ed22/gkp1049f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/8ee463ae5f5e/gkp1049f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/28a809920b03/gkp1049f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/975d994d85b1/gkp1049f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/4cef0976f0b5/gkp1049f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/687a41659d03/gkp1049f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/067f6962c57a/gkp1049f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/71a398bb77ae/gkp1049f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/43c8e87caab4/gkp1049f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/8f0e99618fca/gkp1049f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/be8ce5efb8ab/gkp1049f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/b754fbe7ed22/gkp1049f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/8ee463ae5f5e/gkp1049f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a2e/2817468/28a809920b03/gkp1049f11.jpg

相似文献

1
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains.GeMMA:预测蛋白质结构域超家族内的功能亚家族分类。
Nucleic Acids Res. 2010 Jan;38(3):720-37. doi: 10.1093/nar/gkp1049. Epub 2009 Nov 18.
2
Automated protein subfamily identification and classification.蛋白质亚家族的自动识别与分类
PLoS Comput Biol. 2007 Aug;3(8):e160. doi: 10.1371/journal.pcbi.0030160.
3
Functional classification of CATH superfamilies: a domain-based approach for protein function annotation.CATH 超家族的功能分类:一种基于结构域的蛋白质功能注释方法。
Bioinformatics. 2015 Nov 1;31(21):3460-7. doi: 10.1093/bioinformatics/btv398. Epub 2015 Jul 2.
4
SUPFAM--a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes.SUPFAM——一个通过比较基于序列和基于结构的家族而得出的潜在蛋白质超家族关系数据库:对结构基因组学和基因组功能注释的意义。
Nucleic Acids Res. 2002 Jan 1;30(1):289-93. doi: 10.1093/nar/30.1.289.
5
Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.离散与连续蛋白质结构空间之间的交叉:对蛋白质结构自动分类及网络的见解。
PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.
6
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
7
Identification of subfamily-specific sites based on active sites modeling and clustering.基于活性位点建模和聚类识别亚家族特异性位点。
Bioinformatics. 2010 Dec 15;26(24):3075-82. doi: 10.1093/bioinformatics/btq595. Epub 2010 Oct 26.
8
An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences.一种蛋白质宇宙功能相关聚类的方法:基于活性位点图谱的蛋白质结构和序列聚类
Protein Sci. 2017 Apr;26(4):677-699. doi: 10.1002/pro.3112. Epub 2017 Mar 8.
9
SUPFAM: a database of sequence superfamilies of protein domains.SUPFAM:一个蛋白质结构域序列超家族数据库。
BMC Bioinformatics. 2004 Mar 15;5:28. doi: 10.1186/1471-2105-5-28.
10
GFam: a platform for automatic annotation of gene families.GFam:一个用于基因家族自动注释的平台。
Nucleic Acids Res. 2012 Oct;40(19):e152. doi: 10.1093/nar/gks631. Epub 2012 Jul 11.

引用本文的文献

1
ASMC: investigating the amino acid diversity of enzyme active sites.气道平滑肌细胞:研究酶活性位点的氨基酸多样性。
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf307.
2
Identification of quantitative trait loci and candidate genes associated with growth curve parameters in chinese wenshang barred chickens.中国汶上芦花鸡生长曲线参数相关数量性状位点及候选基因的鉴定
Poult Sci. 2025 Feb;104(2):104767. doi: 10.1016/j.psj.2025.104767. Epub 2025 Jan 2.
3
Clustering protein functional families at large scale with hierarchical approaches.

本文引用的文献

1
PSI-2: structural genomics to cover protein domain family space.PSI-2:用于覆盖蛋白质结构域家族空间的结构基因组学。
Structure. 2009 Jun 10;17(6):869-81. doi: 10.1016/j.str.2009.03.015.
2
Protein function prediction--the power of multiplicity.蛋白质功能预测——多样性的力量。
Trends Biotechnol. 2009 Apr;27(4):210-9. doi: 10.1016/j.tibtech.2009.01.002. Epub 2009 Feb 27.
3
Target selection and annotation for the structural genomics of the amidohydrolase and enolase superfamilies.酰胺水解酶和烯醇酶超家族结构基因组学的靶点选择与注释
大规模使用层次方法对蛋白质功能家族进行聚类。
Protein Sci. 2024 Sep;33(9):e5140. doi: 10.1002/pro.5140.
4
KinFams: De-Novo Classification of Protein Kinases Using CATH Functional Units.KinFams:使用 CATH 功能单元对蛋白激酶进行从头分类
Biomolecules. 2023 Feb 2;13(2):277. doi: 10.3390/biom13020277.
5
Multi-omics data integration analysis identifies the spliceosome as a key regulator of DNA double-strand break repair.多组学数据整合分析确定剪接体是DNA双链断裂修复的关键调节因子。
NAR Cancer. 2022 Apr 8;4(2):zcac013. doi: 10.1093/narcan/zcac013. eCollection 2022 Jun.
6
GAPIT Version 3: Boosting Power and Accuracy for Genomic Association and Prediction.GAPIT 版本 3:提高基因组关联和预测的能力和准确性。
Genomics Proteomics Bioinformatics. 2021 Aug;19(4):629-640. doi: 10.1016/j.gpb.2021.08.005. Epub 2021 Sep 4.
7
Clustering FunFams using sequence embeddings improves EC purity.使用序列嵌入对功能家族进行聚类可提高酶委员会(EC)纯度。
Bioinformatics. 2021 Oct 25;37(20):3449-3455. doi: 10.1093/bioinformatics/btab371.
8
Domain-mediated interactions for protein subfamily identification.基于结构域的蛋白质亚家族识别方法。
Sci Rep. 2020 Jan 14;10(1):264. doi: 10.1038/s41598-019-57187-z.
9
New computational approaches to understanding molecular protein function.理解分子蛋白质功能的新计算方法。
PLoS Comput Biol. 2018 Apr 5;14(4):e1005756. doi: 10.1371/journal.pcbi.1005756. eCollection 2018 Apr.
10
HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold.HMMER 截断阈值工具(HMMERCTTER):使用可靠的截断阈值对超家族蛋白质序列进行有监督分类。
PLoS One. 2018 Mar 26;13(3):e0193757. doi: 10.1371/journal.pone.0193757. eCollection 2018.
J Struct Funct Genomics. 2009 Apr;10(2):107-25. doi: 10.1007/s10969-008-9056-5. Epub 2009 Feb 14.
4
Domain-based and family-specific sequence identity thresholds increase the levels of reliable protein function transfer.基于结构域和家族特异性的序列同一性阈值提高了可靠蛋白质功能转移的水平。
J Mol Biol. 2009 Mar 27;387(2):416-30. doi: 10.1016/j.jmb.2008.12.045. Epub 2008 Dec 25.
5
SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny.超级家族——精密的比较基因组学、数据挖掘、可视化及系统发育学。
Nucleic Acids Res. 2009 Jan;37(Database issue):D380-6. doi: 10.1093/nar/gkn762. Epub 2008 Nov 26.
6
The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies.重温CATH分类——超家族中结构差异的架构综述及新表征方法
Nucleic Acids Res. 2009 Jan;37(Database issue):D310-4. doi: 10.1093/nar/gkn877. Epub 2008 Nov 7.
7
How well can the accuracy of comparative protein structure models be predicted?比较蛋白质结构模型的准确性能被预测到什么程度?
Protein Sci. 2008 Nov;17(11):1881-93. doi: 10.1110/ps.036061.108. Epub 2008 Oct 1.
8
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.用于对海量数据集进行精确层次聚类的高效算法:攻克整个蛋白质空间
Bioinformatics. 2008 Jul 1;24(13):i41-9. doi: 10.1093/bioinformatics/btn174.
9
In silico characterization of proteins: UniProt, InterPro and Integr8.蛋白质的计算机表征:通用蛋白质数据库、InterPro和Integr8。
Mol Biotechnol. 2008 Feb;38(2):165-77. doi: 10.1007/s12033-007-9003-x. Epub 2007 Oct 4.
10
The Pfam protein families database.Pfam蛋白质家族数据库。
Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. doi: 10.1093/nar/gkm960. Epub 2007 Nov 26.