Phylo-PFP：利用远缘序列的系统发育距离改进自动化蛋白质功能预测。

Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences.

机构信息

Department of Computer Science, Purdue University, West Lafayette, IN, USA.

Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.

出版信息

Bioinformatics. 2019 Mar 1;35(5):753-759. doi: 10.1093/bioinformatics/bty704.

DOI:10.1093/bioinformatics/bty704

PMID:30165572

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6394400/

Abstract

MOTIVATION

Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation.

RESULTS

Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP's predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2.

AVAILABILITY AND IMPLEMENTATION

Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质功能注释在包括基因组学、分子生物学、生物化学、系统生物学和生物信息学在内的各个领域的当代生物学中都是基础。功能预测在为解释组学规模的数据提供线索以及协助生物学家为设计实验构建假设方面是不可或缺的。由于测序技术的快速进步，现在测序基因组已成为常规操作，因此计算蛋白质功能预测方法变得越来越重要。注释蛋白质序列的传统方法是从同源搜索的顶级命中转移功能；然而，这种方法存在显著的缺点，包括基因组注释的覆盖率低。

结果

在这里，我们开发了 Phylo-PFP，这是一种新的基于序列的蛋白质功能预测方法，它从广泛的相似序列中挖掘功能信息，包括通过 PSI-BLAST 搜索识别的低序列相似性的序列。为了更准确地评估鉴定序列与查询蛋白质之间的功能相似性，Phylo-PFP 通过考虑它们的系统发育距离重新对检索到的序列进行排序。与 Phylo-PFP 的前身 PFP 相比，PFP 在第二轮功能注释关键评估 (CAFA2) 中排名靠前，Phylo-PFP 在预测准确性方面有了显著提高。Phylo-PFP 进一步表现出优于 CAFA2 中排名最高的预测程序。

可用性和实现

Phylo-PFP 网络服务器可在 http://kiharalab.org/phylo_pfp.php 上获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences.Phylo-PFP：利用远缘序列的系统发育距离改进自动化蛋白质功能预测。

Bioinformatics. 2019 Mar 1;35(5):753-759. doi: 10.1093/bioinformatics/bty704.

Using PFP and ESG Protein Function Prediction Web Servers.使用PFP和ESG蛋白质功能预测网络服务器。

Methods Mol Biol. 2017;1611:1-14. doi: 10.1007/978-1-4939-7015-5_1.

PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool.PFP/ESG：通过基因本体可视化工具增强的自动化蛋白质功能预测服务器。

Bioinformatics. 2015 Jan 15;31(2):271-2. doi: 10.1093/bioinformatics/btu646. Epub 2014 Oct 1.

PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.PFP：利用蛋白质序列数据自动预测具有置信度分数的基因本体功能注释。

Proteins. 2009 Feb 15;74(3):566-82. doi: 10.1002/prot.22172.

The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches.2014年的PFP和ESG蛋白质功能预测方法：数据库更新和集成方法的影响。

Gigascience. 2015 Sep 14;4:43. doi: 10.1186/s13742-015-0083-4. eCollection 2015.

ESG: extended similarity group method for automated protein function prediction.ESG：用于蛋白质功能自动预测的扩展相似性分组方法。

Bioinformatics. 2009 Jul 15;25(14):1739-45. doi: 10.1093/bioinformatics/btp309. Epub 2009 May 12.

Enhanced automated function prediction using distantly related sequences and contextual association by PFP.通过PFP使用远缘相关序列和上下文关联增强自动功能预测。

Protein Sci. 2006 Jun;15(6):1550-6. doi: 10.1110/ps.062153506. Epub 2006 May 2.

In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.在 CAFA 2011 实验中深入评估 PFP 和 ESG 基于序列的功能预测方法的性能。

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-14-S3-S2. Epub 2013 Feb 28.

BLANNOTATOR: enhanced homology-based function prediction of bacterial proteins.BLANNOTATOR：基于同源性的细菌蛋白功能增强预测。

BMC Bioinformatics. 2012 Feb 15;13:33. doi: 10.1186/1471-2105-13-33.

Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP.通过 PFP 进行高可信度功能预测的功能富集分析和功能相似网络构建。

BMC Bioinformatics. 2010 May 19;11:265. doi: 10.1186/1471-2105-11-265.

引用本文的文献

Translating a GO Term List to Human Readable Function Description Using GO2Sum.使用GO2Sum将基因本体（GO）术语列表翻译成人类可读的功能描述。

Methods Mol Biol. 2025;2941:85-99. doi: 10.1007/978-1-0716-4623-6_5.

Proteomic analysis of unicellular cyanobacterium ATCC 51142 under extended light or dark growth.单细胞蓝藻ATCC 51142在延长光照或黑暗条件下生长的蛋白质组学分析

bioRxiv. 2024 Jul 29:2024.07.29.605499. doi: 10.1101/2024.07.29.605499.

Proteomic changes orchestrate metabolic acclimation of a unicellular diazotrophic cyanobacterium during light-dark cycle and nitrogen fixation states.蛋白质组学变化在明暗循环和固氮状态期间协调单细胞固氮蓝细菌的代谢适应。

bioRxiv. 2024 Jul 30:2024.07.30.605809. doi: 10.1101/2024.07.30.605809.

Mutual annotation-based prediction of protein domain functions with Domain2GO.基于互注释的蛋白质结构域功能预测与 Domain2GO。

Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.

GO2Sum: generating human-readable functional summary of proteins from GO terms.GO2Sum：从 GO 术语生成人类可读的蛋白质功能摘要。

NPJ Syst Biol Appl. 2024 Mar 15;10(1):29. doi: 10.1038/s41540-024-00358-0.

A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography.高质量蓝鲸基因组、片段重复序列和历史人口动态。

Mol Biol Evol. 2024 Mar 1;41(3). doi: 10.1093/molbev/msae036.

Chromosome level genome assembly of the Etruscan shrew Suncus etruscus.伊特鲁里亚鼩鼱 Suncus etruscus 的染色体水平基因组组装。

Sci Data. 2024 Feb 7;11(1):176. doi: 10.1038/s41597-024-03011-x.

Domain-PFP allows protein function prediction using function-aware domain embedding representations.域-PFP 使用感知功能的域嵌入表示来进行蛋白质功能预测。

Commun Biol. 2023 Oct 31;6(1):1103. doi: 10.1038/s42003-023-05476-9.

Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations.领域-蛋白质功能预测：使用功能感知领域嵌入表示进行蛋白质功能预测。

bioRxiv. 2023 Aug 24:2023.08.23.554486. doi: 10.1101/2023.08.23.554486.

Advanced Situation with Recombinant Toxins: Diversity, Production and Application Purposes.重组毒素的现状：多样性、生产及应用目的。

Int J Mol Sci. 2023 Feb 27;24(5):4630. doi: 10.3390/ijms24054630.

本文引用的文献

The Reactome Pathway Knowledgebase.Reactome 通路知识库。

Nucleic Acids Res. 2018 Jan 4;46(D1):D649-D655. doi: 10.1093/nar/gkx1132.

20 years of the SMART protein domain annotation resource.SMART 蛋白质结构域注释资源 20 年。

Nucleic Acids Res. 2018 Jan 4;46(D1):D493-D496. doi: 10.1093/nar/gkx922.

MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.MMseqs2支持进行灵敏的蛋白质序列搜索，以分析海量数据集。

Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988. Epub 2017 Oct 16.

InterPro in 2017-beyond protein family and domain annotations.2017年的InterPro——超越蛋白质家族和结构域注释

Nucleic Acids Res. 2017 Jan 4;45(D1):D190-D199. doi: 10.1093/nar/gkw1107. Epub 2016 Nov 29.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.对蛋白质功能预测方法的扩展评估显示准确性有所提高。

Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.

Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny.序列相似性得分无法概括系统发育的原因中存在的异质分子过程。

Brief Bioinform. 2017 May 1;18(3):451-457. doi: 10.1093/bib/bbw034.

The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库：迈向更可持续的未来。

Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.

UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.UniProtKB/Swiss-Prot，即UniProt知识库的人工注释部分：如何使用条目视图。

Methods Mol Biol. 2016;1374:23-54. doi: 10.1007/978-1-4939-3167-5_2.

GoFDR: A sequence alignment based method for predicting protein functions.GoFDR：一种基于序列比对预测蛋白质功能的方法。

Methods. 2016 Jan 15;93:3-14. doi: 10.1016/j.ymeth.2015.08.009. Epub 2015 Aug 12.

SIFTER search: a web server for accurate phylogeny-based protein function prediction.SIFTER搜索：一个用于基于系统发育的蛋白质功能准确预测的网络服务器。

Nucleic Acids Res. 2015 Jul 1;43(W1):W141-7. doi: 10.1093/nar/gkv461. Epub 2015 May 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验