花之力：将蛋白质聚类到结构域架构类别中以进行蛋白质功能的系统发育推断

FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function.

作者信息

Krishnamurthy Nandini, Brown Duncan, Sjölander Kimmen

机构信息

Department of BioEngineering, 473 Evans Hall #1762, University of California, Berkeley, CA 94720-1762, USA.

出版信息

BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2148-7-S1-S12.

DOI:10.1186/1471-2148-7-S1-S12

PMID:17288570

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1796606/

Abstract

BACKGROUND

Function prediction by transfer of annotation from the top database hit in a homology search has been shown to be prone to systematic error. Phylogenomic analysis reduces these errors by inferring protein function within the evolutionary context of the entire family. However, accuracy of function prediction for multi-domain proteins depends on all members having the same overall domain structure. By contrast, most common homolog detection methods are optimized for retrieving local homologs, and do not address this requirement.

RESULTS

We present FlowerPower, a novel clustering algorithm designed for the identification of global homologs as a precursor to structural phylogenomic analysis. Similar to methods such as PSIBLAST, FlowerPower employs an iterative approach to clustering sequences. However, rather than using a single HMM or profile to expand the cluster, FlowerPower identifies subfamilies using the SCI-PHY algorithm and then selects and aligns new homologs using subfamily hidden Markov models. FlowerPower is shown to outperform BLAST, PSI-BLAST and the UCSC SAM-Target 2K methods at discrimination between proteins in the same domain architecture class and those having different overall domain structures.

CONCLUSION

Structural phylogenomic analysis enables biologists to avoid the systematic errors associated with annotation transfer; clustering sequences based on sharing the same domain architecture is a critical first step in this process. FlowerPower is shown to consistently identify homologous sequences having the same domain architecture as the query.

AVAILABILITY

FlowerPower is available as a webserver at http://phylogenomics.berkeley.edu/flowerpower/.

摘要

背景

通过在同源性搜索中从顶级数据库匹配项转移注释来进行功能预测已被证明容易出现系统误差。系统发育基因组分析通过在整个家族的进化背景下推断蛋白质功能来减少这些误差。然而，多结构域蛋白质功能预测的准确性取决于所有成员具有相同的整体结构域结构。相比之下，大多数常见的同源物检测方法是针对检索局部同源物进行优化的，并未满足这一要求。

结果

我们提出了FlowerPower，这是一种新颖的聚类算法，设计用于识别全局同源物，作为结构系统发育基因组分析的前奏。与PSIBLAST等方法类似，FlowerPower采用迭代方法对序列进行聚类。然而，FlowerPower不是使用单个隐马尔可夫模型（HMM）或谱来扩展聚类，而是使用SCI-PHY算法识别亚家族，然后使用亚家族隐马尔可夫模型选择并比对新的同源物。在区分具有相同结构域结构类别的蛋白质和具有不同整体结构域结构的蛋白质方面，FlowerPower表现优于BLAST、PSI-BLAST和加州大学圣克鲁兹分校的SAM-Target 2K方法。

结论

结构系统发育基因组分析使生物学家能够避免与注释转移相关联的系统误差；基于共享相同结构域结构对序列进行聚类是这一过程中关键的第一步。结果表明，FlowerPower能够始终如一地识别与查询序列具有相同结构域结构的同源序列。

可用性

FlowerPower可作为网络服务器在http://phylogenomics.berkeley.edu/flowerpower/上获取。

相似文献

FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function.花之力：将蛋白质聚类到结构域架构类别中以进行蛋白质功能的系统发育推断

BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2148-7-S1-S12.

Berkeley Phylogenomics Group web servers: resources for structural phylogenomic analysis.伯克利系统发育基因组学小组网络服务器：结构系统发育基因组分析资源。

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W27-32. doi: 10.1093/nar/gkm325. Epub 2007 May 8.

Automated protein subfamily identification and classification.蛋白质亚家族的自动识别与分类

PLoS Comput Biol. 2007 Aug;3(8):e160. doi: 10.1371/journal.pcbi.0030160.

Automatic annotation of protein function based on family identification.基于家族识别的蛋白质功能自动注释。

Proteins. 2003 Nov 15;53(3):683-92. doi: 10.1002/prot.10449.

PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classification.系统发育事实：一个用于蛋白质功能和结构分类的在线结构系统发育基因组学百科全书。

Genome Biol. 2006;7(9):R83. doi: 10.1186/gb-2006-7-9-r83.

Hidden Markov models for detecting remote protein homologies.用于检测远程蛋白质同源性的隐马尔可夫模型。

Bioinformatics. 1998;14(10):846-56. doi: 10.1093/bioinformatics/14.10.846.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

FastBLAST: homology relationships for millions of proteins.FastBLAST：数百万种蛋白质的同源关系。

PLoS One. 2008;3(10):e3589. doi: 10.1371/journal.pone.0003589. Epub 2008 Oct 31.

Phylogenomic inference of protein molecular function: advances and challenges.蛋白质分子功能的系统发育基因组学推断：进展与挑战

Bioinformatics. 2004 Jan 22;20(2):170-9. doi: 10.1093/bioinformatics/bth021.

A comparison of scoring functions for protein sequence profile alignment.蛋白质序列谱比对评分函数的比较

Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12.

引用本文的文献

KinOrtho: a method for mapping human kinase orthologs across the tree of life and illuminating understudied kinases.KinOrtho：一种在生命之树中映射人类激酶直系同源物并阐明研究不足的激酶的方法。

BMC Bioinformatics. 2021 Sep 18;22(1):446. doi: 10.1186/s12859-021-04358-3.

Capabilities of bioinformatics tools for optimizing physicochemical features of proteins used in Nano biosensors: A short overview of the tools related to bioinformatics.用于优化纳米生物传感器中蛋白质物理化学特性的生物信息学工具的能力：生物信息学相关工具概述

Biochem Biophys Rep. 2021 Aug 3;27:101094. doi: 10.1016/j.bbrep.2021.101094. eCollection 2021 Sep.

HIPPI: highly accurate protein family classification with ensembles of HMMs.HIPPI：利用隐马尔可夫模型集合进行高精度蛋白质家族分类

BMC Genomics. 2016 Nov 11;17(Suppl 10):765. doi: 10.1186/s12864-016-3097-0.

The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches.2014年的PFP和ESG蛋白质功能预测方法：数据库更新和集成方法的影响。

Gigascience. 2015 Sep 14;4:43. doi: 10.1186/s13742-015-0083-4. eCollection 2015.

Determining microbial products and identifying molecular targets in the human microbiome.确定人类微生物组中的微生物产物并识别分子靶点。

Cell Metab. 2014 Nov 4;20(5):731-741. doi: 10.1016/j.cmet.2014.10.003.

Profile hidden Markov models for the detection of viruses within metagenomic sequence data.用于在宏基因组序列数据中检测病毒的轮廓隐马尔可夫模型。

PLoS One. 2014 Aug 20;9(8):e105067. doi: 10.1371/journal.pone.0105067. eCollection 2014.

Reassessing domain architecture evolution of metazoan proteins: major impact of errors caused by confusing paralogs and epaktologs.重新评估后生动物蛋白结构域架构的进化：由混淆的旁系同源物和错配同源物引起的错误的重大影响。

Genes (Basel). 2011 Aug 2;2(3):516-61. doi: 10.3390/genes2030516.

The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification.PhyloFacts FAT-CAT 网络服务器：使用快速近似树分类进行直系同源基因鉴定和功能预测。

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W242-8. doi: 10.1093/nar/gkt399. Epub 2013 May 18.

In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.在 CAFA 2011 实验中深入评估 PFP 和 ESG 基于序列的功能预测方法的性能。

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-14-S3-S2. Epub 2013 Feb 28.

Evaluation of function predictions by PFP, ESG,and PSI-BLAST for moonlighting proteins.通过PFP、ESG和PSI-BLAST对兼职蛋白进行功能预测的评估。

BMC Proc. 2012 Nov 13;6 Suppl 7(Suppl 7):S5. doi: 10.1186/1753-6561-6-S7-S5.

本文引用的文献

SMART 5: domains in the context of genomes and networks.SMART 5：基因组与网络背景下的结构域

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D257-60. doi: 10.1093/nar/gkj079.

Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions.生命三界中的多结构域蛋白：孤儿结构域及其他未分类区域。

J Mol Biol. 2005 Apr 22;348(1):231-43. doi: 10.1016/j.jmb.2005.02.007.

Subfamily hmms in functional genomics.功能基因组学中的亚家族隐马尔可夫模型

Pac Symp Biocomput. 2005:322-33.

MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE：具有高精度和高吞吐量的多序列比对。

Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.

Phylogenomic inference of protein molecular function: advances and challenges.蛋白质分子功能的系统发育基因组学推断：进展与挑战

Bioinformatics. 2004 Jan 22;20(2):170-9. doi: 10.1093/bioinformatics/bth021.

SCOP database in 2004: refinements integrate structure and sequence family data.2004年的SCOP数据库：改进整合了结构和序列家族数据。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D226-9. doi: 10.1093/nar/gkh039.

The Pfam protein families database.Pfam蛋白质家族数据库。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41. doi: 10.1093/nar/gkh121.

What is the value added by human intervention in protein structure prediction?在蛋白质结构预测中，人为干预增加了什么价值？

Proteins. 2001;Suppl 5:86-91. doi: 10.1002/prot.10021.

Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption.基因组功能注释中系统误差的来源：结构域重排、非直系同源基因替代和操纵子破坏。

In Silico Biol. 1998;1(1):55-67.

Domain combinations in archaeal, eubacterial and eukaryotic proteomes.古菌、真细菌和真核生物蛋白质组中的结构域组合

J Mol Biol. 2001 Jul 6;310(2):311-25. doi: 10.1006/jmbi.2001.4776.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

花之力：将蛋白质聚类到结构域架构类别中以进行蛋白质功能的系统发育推断

FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

AVAILABILITY

背景

结果

结论

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献