PHOG-BLAST——用于蛋白质家族快速相似性搜索的新一代工具。

PHOG-BLAST--a new generation tool for fast similarity search of protein families.

作者信息

Merkeev Igor V, Mironov Andrey A

机构信息

State Scientific Center GosNIIGenetica, 1st Dorozhny pr,, 1, Moscow, 113545, Russia.

出版信息

BMC Evol Biol. 2006 Jun 22;6:51. doi: 10.1186/1471-2148-6-51.

DOI:10.1186/1471-2148-6-51

PMID:16792802

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1522020/

Abstract

BACKGROUND

The need to compare protein profiles frequently arises in various protein research areas: comparison of protein families, domain searches, resolution of orthology and paralogy. The existing fast algorithms can only compare a protein sequence with a protein sequence and a profile with a sequence. Algorithms to compare profiles use dynamic programming and complex scoring functions.

RESULTS

We developed a new algorithm called PHOG-BLAST for fast similarity search of profiles. This algorithm uses profile discretization to convert a profile to a finite alphabet and utilizes hashing for fast search. To determine the optimal alphabet, we analyzed columns in reliable multiple alignments and obtained column clusters in the 20-dimensional profile space by applying a special clustering procedure. We show that the clustering procedure works best if its parameters are chosen so that 20 profile clusters are obtained which can be interpreted as ancestral amino acid residues. With these clusters, only less than 2% of columns in multiple alignments are out of clusters. We tested the performance of PHOG-BLAST vs. PSI-BLAST on three well-known databases of multiple alignments: COG, PFAM and BALIBASE. On the COG database both algorithms showed the same performance, on PFAM and BALIBASE PHOG-BLAST was much superior to PSI-BLAST. PHOG-BLAST required 10-20 times less computer memory and computation time than PSI-BLAST.

CONCLUSION

Since PHOG-BLAST can compare multiple alignments of protein families, it can be used in different areas of comparative proteomics and protein evolution. For example, PHOG-BLAST helped to build the PHOG database of phylogenetic orthologous groups. An essential step in building this database was comparing protein complements of different species and orthologous groups of different taxons on a personal computer in reasonable time. When it is applied to detect weak similarity between protein families, PHOG-BLAST is less precise than rigorous profile-profile comparison method, though it runs much faster and can be used as a hit pre-selecting tool.

摘要

背景

在各种蛋白质研究领域中，经常需要比较蛋白质谱：蛋白质家族比较、结构域搜索、直系同源和旁系同源关系的解析。现有的快速算法只能将蛋白质序列与蛋白质序列进行比较，以及将谱与序列进行比较。比较谱的算法使用动态规划和复杂的评分函数。

结果

我们开发了一种名为PHOG-BLAST的新算法，用于快速搜索谱的相似性。该算法使用谱离散化将谱转换为有限字母表，并利用哈希进行快速搜索。为了确定最佳字母表，我们分析了可靠多序列比对中的列，并通过应用特殊的聚类程序在20维谱空间中获得列簇。我们表明，如果选择其参数使得获得20个谱簇，这些簇可被解释为祖先氨基酸残基，则聚类程序效果最佳。有了这些簇，多序列比对中只有不到2%的列不在簇中。我们在三个著名的多序列比对数据库COG、PFAM和BALIBASE上测试了PHOG-BLAST与PSI-BLAST的性能。在COG数据库上，两种算法表现相同，在PFAM和BALIBASE上，PHOG-BLAST远优于PSI-BLAST。PHOG-BLAST所需的计算机内存和计算时间比PSI-BLAST少10到20倍。

结论

由于PHOG-BLAST可以比较蛋白质家族的多序列比对，因此可用于比较蛋白质组学和蛋白质进化的不同领域。例如，PHOG-BLAST有助于构建系统发育直系同源组的PHOG数据库。构建该数据库的一个关键步骤是在个人计算机上合理的时间内比较不同物种的蛋白质补体和不同分类单元的直系同源组。当应用于检测蛋白质家族之间的弱相似性时，PHOG-BLAST不如严格的谱-谱比较方法精确，尽管它运行速度快得多，可作为命中预选择工具。

相似文献

PHOG-BLAST--a new generation tool for fast similarity search of protein families.PHOG-BLAST——用于蛋白质家族快速相似性搜索的新一代工具。

BMC Evol Biol. 2006 Jun 22;6:51. doi: 10.1186/1471-2148-6-51.

PHOG: a database of supergenomes built from proteome complements.PHOG：一个基于蛋白质组互补构建的超基因组数据库。

BMC Evol Biol. 2006 Jun 22;6:52. doi: 10.1186/1471-2148-6-52.

Within the twilight zone: a sensitive profile-profile comparison tool based on information theory.在模糊区域内：一种基于信息论的灵敏的轮廓-轮廓比较工具。

J Mol Biol. 2002 Feb 1;315(5):1257-75. doi: 10.1006/jmbi.2001.5293.

Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Database similarity searches.数据库相似性搜索。

Methods Mol Biol. 2008;484:361-78. doi: 10.1007/978-1-59745-398-1_24.

A comparison of scoring functions for protein sequence profile alignment.蛋白质序列谱比对评分函数的比较

Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12.

Homology-based modeling of 3D structures of protein-protein complexes using alignments of modified sequence profiles.利用修饰序列谱比对进行蛋白质-蛋白质复合物三维结构的基于同源性的建模。

Int J Biol Macromol. 2008 Aug 15;43(2):198-208. doi: 10.1016/j.ijbiomac.2008.05.004. Epub 2008 May 21.

Analysis and prediction of functional sub-types from protein sequence alignments.基于蛋白质序列比对的功能亚类型分析与预测。

J Mol Biol. 2000 Oct 13;303(1):61-76. doi: 10.1006/jmbi.2000.4036.

SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.SS-Wrapper：用于在Linux集群上进行相似性搜索的一组包装应用程序。

BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.

引用本文的文献

Functional classification of protein toxins as a basis for bioinformatic screening.蛋白质毒素的功能分类是生物信息学筛选的基础。

Sci Rep. 2017 Oct 24;7(1):13940. doi: 10.1038/s41598-017-13957-1.

PSimScan: algorithm and utility for fast protein similarity search.PSimScan：快速蛋白质相似性搜索的算法和工具。

PLoS One. 2013;8(3):e58505. doi: 10.1371/journal.pone.0058505. Epub 2013 Mar 7.

Physicochemical property consensus sequences for functional analysis, design of multivalent antigens and targeted antivirals.理化性质共识序列用于功能分析、多价抗原设计和靶向抗病毒药物。

BMC Bioinformatics. 2012;13 Suppl 13(Suppl 13):S9. doi: 10.1186/1471-2105-13-S13-S9. Epub 2012 Aug 24.

Simplifying complex sequence information: a PCP-consensus protein binds antibodies against all four Dengue serotypes.简化复杂的序列信息：一种 PCP 共识蛋白可与所有四种登革热血清型的抗体结合。

Vaccine. 2012 Sep 14;30(42):6081-7. doi: 10.1016/j.vaccine.2012.07.042. Epub 2012 Jul 31.

Powerful fusion: PSI-BLAST and consensus sequences.强大的融合：PSI-BLAST与共有序列

Bioinformatics. 2008 Sep 15;24(18):1987-93. doi: 10.1093/bioinformatics/btn384. Epub 2008 Aug 4.

OrthoDB: the hierarchical catalog of eukaryotic orthologs.OrthoDB：真核生物直系同源基因的分层目录。

Nucleic Acids Res. 2008 Jan;36(Database issue):D271-5. doi: 10.1093/nar/gkm845. Epub 2007 Oct 18.

Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments.一致序列通过模拟轮廓-轮廓比对来改进PSI-BLAST。

Nucleic Acids Res. 2007;35(7):2238-46. doi: 10.1093/nar/gkm107. Epub 2007 Mar 16.

PHOG: a database of supergenomes built from proteome complements.PHOG：一个基于蛋白质组互补构建的超基因组数据库。

BMC Evol Biol. 2006 Jun 22;6:52. doi: 10.1186/1471-2148-6-52.

本文引用的文献

PHOG: a database of supergenomes built from proteome complements.PHOG：一个基于蛋白质组互补构建的超基因组数据库。

BMC Evol Biol. 2006 Jun 22;6:52. doi: 10.1186/1471-2148-6-52.

Quasi-consensus-based comparison of profile hidden Markov models for protein sequences.基于准共识的蛋白质序列轮廓隐马尔可夫模型比较

Bioinformatics. 2005 May 15;21(10):2287-93. doi: 10.1093/bioinformatics/bti374. Epub 2005 Mar 29.

Scoring profile-to-profile sequence alignments.对图谱与图谱之间的序列进行比对评分。

Protein Sci. 2004 Jun;13(6):1612-26. doi: 10.1110/ps.03601504.

The Pfam protein families database.Pfam蛋白质家族数据库。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D138-41. doi: 10.1093/nar/gkh121.

Recent improvements to the PROSITE database.PROSITE数据库的近期改进。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D134-7. doi: 10.1093/nar/gkh044.

COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance.COMPASS：一种用于比较多个蛋白质序列比对并评估统计学显著性的工具。

J Mol Biol. 2003 Feb 7;326(1):317-36. doi: 10.1016/s0022-2836(02)01371-2.

Within the twilight zone: a sensitive profile-profile comparison tool based on information theory.在模糊区域内：一种基于信息论的灵敏的轮廓-轮廓比较工具。

J Mol Biol. 2002 Feb 1;315(5):1257-75. doi: 10.1006/jmbi.2001.5293.

Orthologs and paralogs - we need to get it right.直系同源基因和旁系同源基因——我们需要正确理解它们。

Genome Biol. 2001;2(8):INTERACTIONS1002. doi: 10.1186/gb-2001-2-8-interactions1002. Epub 2001 Aug 3.

An apology for orthologs - or brave new memes.为直系同源基因致歉——或全新的模因。

Genome Biol. 2001;2(4):COMMENT1005. doi: 10.1186/gb-2001-2-4-comment1005. Epub 2001 Apr 6.

The COG database: new developments in phylogenetic classification of proteins from complete genomes.COG数据库：来自完整基因组的蛋白质系统发育分类的新进展。

Nucleic Acids Res. 2001 Jan 1;29(1):22-8. doi: 10.1093/nar/29.1.22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PHOG-BLAST——用于蛋白质家族快速相似性搜索的新一代工具。

PHOG-BLAST--a new generation tool for fast similarity search of protein families.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献