Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA.
Computational Biology & Bioinformatics - i12 Informatics, Technical University of Munich (TUM), Munich, Germany.
Bioinformatics. 2018 Jul 1;34(13):i304-i312. doi: 10.1093/bioinformatics/bty262.
The rapid drop in sequencing costs has produced many more (predicted) protein sequences than can feasibly be functionally annotated with wet-lab experiments. Thus, many computational methods have been developed for this purpose. Most of these methods employ homology-based inference, approximated via sequence alignments, to transfer functional annotations between proteins. The increase in the number of available sequences, however, has drastically increased the search space, thus significantly slowing down alignment methods.
Here we describe homology-derived functional similarity of proteins (HFSP), a novel computational method that uses results of a high-speed alignment algorithm, MMseqs2, to infer functional similarity of proteins on the basis of their alignment length and sequence identity. We show that our method is accurate (85% precision) and fast (more than 40-fold speed increase over state-of-the-art). HFSP can help correct at least a 16% error in legacy curations, even for a resource of as high quality as Swiss-Prot. These findings suggest HFSP as an ideal resource for large-scale functional annotation efforts.
Supplementary data are available at Bioinformatics online.
测序成本的迅速下降产生了大量(预测的)蛋白质序列,这些序列仅凭湿实验无法合理地进行功能注释。因此,已经开发了许多用于此目的的计算方法。这些方法大多采用基于同源性的推断,通过序列比对进行近似,以在蛋白质之间转移功能注释。然而,可用序列数量的增加极大地增加了搜索空间,从而大大降低了对齐方法的速度。
在这里,我们描述了蛋白质同源衍生的功能相似性(HFSP),这是一种新的计算方法,它使用高速对齐算法 MMseqs2 的结果,根据其对齐长度和序列同一性推断蛋白质的功能相似性。我们表明,我们的方法是准确的(85%的精度)和快速的(比最先进的方法快 40 多倍)。HFSP 甚至可以帮助纠正瑞士 - Prot 等高质量资源中至少 16%的遗留注释错误。这些发现表明 HFSP 是大规模功能注释工作的理想资源。
补充数据可在“Bioinformatics”在线获取。