Department of Computer Science, Purdue University, West Lafayette, IN, USA.
Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
Bioinformatics. 2019 Mar 1;35(5):753-759. doi: 10.1093/bioinformatics/bty704.
Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation.
Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP's predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2.
Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php.
Supplementary data are available at Bioinformatics online.
蛋白质功能注释在包括基因组学、分子生物学、生物化学、系统生物学和生物信息学在内的各个领域的当代生物学中都是基础。功能预测在为解释组学规模的数据提供线索以及协助生物学家为设计实验构建假设方面是不可或缺的。由于测序技术的快速进步,现在测序基因组已成为常规操作,因此计算蛋白质功能预测方法变得越来越重要。注释蛋白质序列的传统方法是从同源搜索的顶级命中转移功能;然而,这种方法存在显著的缺点,包括基因组注释的覆盖率低。
在这里,我们开发了 Phylo-PFP,这是一种新的基于序列的蛋白质功能预测方法,它从广泛的相似序列中挖掘功能信息,包括通过 PSI-BLAST 搜索识别的低序列相似性的序列。为了更准确地评估鉴定序列与查询蛋白质之间的功能相似性,Phylo-PFP 通过考虑它们的系统发育距离重新对检索到的序列进行排序。与 Phylo-PFP 的前身 PFP 相比,PFP 在第二轮功能注释关键评估 (CAFA2) 中排名靠前,Phylo-PFP 在预测准确性方面有了显著提高。Phylo-PFP 进一步表现出优于 CAFA2 中排名最高的预测程序。
Phylo-PFP 网络服务器可在 http://kiharalab.org/phylo_pfp.php 上获得。
补充数据可在生物信息学在线获得。