Suppr超能文献

花之力:将蛋白质聚类到结构域架构类别中以进行蛋白质功能的系统发育推断

FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function.

作者信息

Krishnamurthy Nandini, Brown Duncan, Sjölander Kimmen

机构信息

Department of BioEngineering, 473 Evans Hall #1762, University of California, Berkeley, CA 94720-1762, USA.

出版信息

BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2148-7-S1-S12.

Abstract

BACKGROUND

Function prediction by transfer of annotation from the top database hit in a homology search has been shown to be prone to systematic error. Phylogenomic analysis reduces these errors by inferring protein function within the evolutionary context of the entire family. However, accuracy of function prediction for multi-domain proteins depends on all members having the same overall domain structure. By contrast, most common homolog detection methods are optimized for retrieving local homologs, and do not address this requirement.

RESULTS

We present FlowerPower, a novel clustering algorithm designed for the identification of global homologs as a precursor to structural phylogenomic analysis. Similar to methods such as PSIBLAST, FlowerPower employs an iterative approach to clustering sequences. However, rather than using a single HMM or profile to expand the cluster, FlowerPower identifies subfamilies using the SCI-PHY algorithm and then selects and aligns new homologs using subfamily hidden Markov models. FlowerPower is shown to outperform BLAST, PSI-BLAST and the UCSC SAM-Target 2K methods at discrimination between proteins in the same domain architecture class and those having different overall domain structures.

CONCLUSION

Structural phylogenomic analysis enables biologists to avoid the systematic errors associated with annotation transfer; clustering sequences based on sharing the same domain architecture is a critical first step in this process. FlowerPower is shown to consistently identify homologous sequences having the same domain architecture as the query.

AVAILABILITY

FlowerPower is available as a webserver at http://phylogenomics.berkeley.edu/flowerpower/.

摘要

背景

通过在同源性搜索中从顶级数据库匹配项转移注释来进行功能预测已被证明容易出现系统误差。系统发育基因组分析通过在整个家族的进化背景下推断蛋白质功能来减少这些误差。然而,多结构域蛋白质功能预测的准确性取决于所有成员具有相同的整体结构域结构。相比之下,大多数常见的同源物检测方法是针对检索局部同源物进行优化的,并未满足这一要求。

结果

我们提出了FlowerPower,这是一种新颖的聚类算法,设计用于识别全局同源物,作为结构系统发育基因组分析的前奏。与PSIBLAST等方法类似,FlowerPower采用迭代方法对序列进行聚类。然而,FlowerPower不是使用单个隐马尔可夫模型(HMM)或谱来扩展聚类,而是使用SCI-PHY算法识别亚家族,然后使用亚家族隐马尔可夫模型选择并比对新的同源物。在区分具有相同结构域结构类别的蛋白质和具有不同整体结构域结构的蛋白质方面,FlowerPower表现优于BLAST、PSI-BLAST和加州大学圣克鲁兹分校的SAM-Target 2K方法。

结论

结构系统发育基因组分析使生物学家能够避免与注释转移相关联的系统误差;基于共享相同结构域结构对序列进行聚类是这一过程中关键的第一步。结果表明,FlowerPower能够始终如一地识别与查询序列具有相同结构域结构的同源序列。

可用性

FlowerPower可作为网络服务器在http://phylogenomics.berkeley.edu/flowerpower/上获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验