• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于自顶向下的聚类方法进行蛋白质亚家族识别。

Top-down clustering for protein subfamily identification.

机构信息

Department of Computer Science, KU Leuven, Belgium.

出版信息

Evol Bioinform Online. 2013 May 6;9:185-202. doi: 10.4137/EBO.S11609. Print 2013.

DOI:10.4137/EBO.S11609
PMID:23700359
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3653887/
Abstract

We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.

摘要

我们提出了一种新的方法来进行蛋白质亚家族识别任务,即找到蛋白质家族中功能密切相关的序列亚群。与系统发生基因组学分析一致,该方法首先使用蛋白质序列的多重比对作为输入构建层次树,然后使用后剪枝过程从树中提取聚类。与现有方法不同,它自顶向下构建层次树,而不是自底向上,并将特定的突变与每个子聚类的划分相关联。这种方法的动机假设是,它可能会产生更好的树拓扑结构,从而更准确地识别亚家族,并且还可以指示功能重要的位点,并允许对新蛋白质进行轻松分类。彻底的实验评估证实了这一假设。与最先进的方法 SCI-PHY 相比,新方法产生了更准确的聚类和更好的树拓扑结构,能够识别已知的功能位点,并能够识别单独允许对新序列进行分类的突变,其准确性接近隐马尔可夫模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/070f1fe52aa5/ebo-9-2013-185f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/5880a5109cbe/ebo-9-2013-185f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/1ebbce23751e/ebo-9-2013-185f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/59d9e065b8e6/ebo-9-2013-185f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/aa41a2f758f2/ebo-9-2013-185f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/070f1fe52aa5/ebo-9-2013-185f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/5880a5109cbe/ebo-9-2013-185f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/1ebbce23751e/ebo-9-2013-185f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/59d9e065b8e6/ebo-9-2013-185f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/aa41a2f758f2/ebo-9-2013-185f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bb3/3653887/070f1fe52aa5/ebo-9-2013-185f5.jpg

相似文献

1
Top-down clustering for protein subfamily identification.基于自顶向下的聚类方法进行蛋白质亚家族识别。
Evol Bioinform Online. 2013 May 6;9:185-202. doi: 10.4137/EBO.S11609. Print 2013.
2
Berkeley Phylogenomics Group web servers: resources for structural phylogenomic analysis.伯克利系统发育基因组学小组网络服务器:结构系统发育基因组分析资源。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W27-32. doi: 10.1093/nar/gkm325. Epub 2007 May 8.
3
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
4
FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function.花之力:将蛋白质聚类到结构域架构类别中以进行蛋白质功能的系统发育推断
BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2148-7-S1-S12.
5
A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles.一种用于分层聚类基因表达谱的动态生长自组织树(DGSOT)。
Bioinformatics. 2004 Nov 1;20(16):2605-17. doi: 10.1093/bioinformatics/bth292. Epub 2004 May 6.
6
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains.GeMMA:预测蛋白质结构域超家族内的功能亚家族分类。
Nucleic Acids Res. 2010 Jan;38(3):720-37. doi: 10.1093/nar/gkp1049. Epub 2009 Nov 18.
7
Protein classification based on text document classification techniques.基于文本文档分类技术的蛋白质分类。
Proteins. 2005 Mar 1;58(4):955-70. doi: 10.1002/prot.20373.
8
K-ary clustering with optimal leaf ordering for gene expression data.用于基因表达数据的具有最优叶排序的K元聚类
Bioinformatics. 2003 Jun 12;19(9):1070-8. doi: 10.1093/bioinformatics/btg030.
9
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II:一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。
Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.
10
Hierarchical clustering in minimum spanning trees.最小生成树中的层次聚类。
Chaos. 2015 Feb;25(2):023107. doi: 10.1063/1.4908014.

引用本文的文献

1
A systematic pipeline for classifying bacterial operons reveals the evolutionary landscape of biofilm machineries.一个用于分类细菌操纵子的系统管道揭示了生物膜机械装置的进化景观。
PLoS Comput Biol. 2020 Apr 1;16(4):e1007721. doi: 10.1371/journal.pcbi.1007721. eCollection 2020 Apr.
2
Functional classification of CATH superfamilies: a domain-based approach for protein function annotation.CATH 超家族的功能分类:一种基于结构域的蛋白质功能注释方法。
Bioinformatics. 2015 Nov 1;31(21):3460-7. doi: 10.1093/bioinformatics/btv398. Epub 2015 Jul 2.

本文引用的文献

1
Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets.使用简化氨基酸字母表的相对复杂度度量对蛋白质家族进行功能亚型聚类。
BMC Bioinformatics. 2010 Aug 18;11:428. doi: 10.1186/1471-2105-11-428.
2
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains.GeMMA:预测蛋白质结构域超家族内的功能亚家族分类。
Nucleic Acids Res. 2010 Jan;38(3):720-37. doi: 10.1093/nar/gkp1049. Epub 2009 Nov 18.
3
SitesIdentify: a protein functional site prediction tool.
SitesIdentify:一种蛋白质功能位点预测工具。
BMC Bioinformatics. 2009 Nov 18;10:379. doi: 10.1186/1471-2105-10-379.
4
Global considerations in hierarchical clustering reveal meaningful patterns in data.层次聚类中的全局考量揭示了数据中有意义的模式。
PLoS One. 2008 May 21;3(5):e2247. doi: 10.1371/journal.pone.0002247.
5
Automated protein subfamily identification and classification.蛋白质亚家族的自动识别与分类
PLoS Comput Biol. 2007 Aug;3(8):e160. doi: 10.1371/journal.pcbi.0030160.
6
Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design.通过使用计算设计区分蛋白质家族进化中的结构和功能限制来改进蛋白质功能位点预测。
Nucleic Acids Res. 2005 Oct 13;33(18):5861-7. doi: 10.1093/nar/gki894. Print 2005.
7
Subfamily hmms in functional genomics.功能基因组学中的亚家族隐马尔可夫模型
Pac Symp Biocomput. 2005:322-33.
8
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
9
The Universal Protein Resource (UniProt).通用蛋白质资源(UniProt)。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D154-9. doi: 10.1093/nar/gki070.
10
Finding important sites in protein sequences.寻找蛋白质序列中的重要位点。
Proc Natl Acad Sci U S A. 2002 Nov 12;99(23):14764-71. doi: 10.1073/pnas.222508899. Epub 2002 Nov 4.