Suppr
超能文献

2014年的PFP和ESG蛋白质功能预测方法：数据库更新和集成方法的影响。

The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches.

作者信息

Khan Ishita K, Wei Qing, Chapman Samuel, Kc Dukka B, Kihara Daisuke

机构信息

Department of Computer Sciences, Purdue University, West Lafayette, IN 47907 USA.

Department of Computational Science and Engineering, North Carolina A & T State University, Greensboro, NC 27411 USA.

出版信息

Gigascience. 2015 Sep 14;4:43. doi: 10.1186/s13742-015-0083-4. eCollection 2015.

DOI:10.1186/s13742-015-0083-4

PMID:26380077

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4570625/

Abstract

BACKGROUND

Functional annotation of novel proteins is one of the central problems in bioinformatics. With the ever-increasing development of genome sequencing technologies, more and more sequence information is becoming available to analyze and annotate. To achieve fast and automatic function annotation, many computational (automated) function prediction (AFP) methods have been developed. To objectively evaluate the performance of such methods on a large scale, community-wide assessment experiments have been conducted. The second round of the Critical Assessment of Function Annotation (CAFA) experiment was held in 2013-2014. Evaluation of participating groups was reported in a special interest group meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in Boston in 2014. Our group participated in both CAFA1 and CAFA2 using multiple, in-house AFP methods. Here, we report benchmark results of our methods obtained in the course of preparation for CAFA2 prior to submitting function predictions for CAFA2 targets.

RESULTS

For CAFA2, we updated the annotation databases used by our methods, protein function prediction (PFP) and extended similarity group (ESG), and benchmarked their function prediction performances using the original (older) and updated databases. Performance evaluation for PFP with different settings and ESG are discussed. We also developed two ensemble methods that combine function predictions from six independent, sequence-based AFP methods. We further analyzed the performances of our prediction methods by enriching the predictions with prior distribution of gene ontology (GO) terms. Examples of predictions by the ensemble methods are discussed.

CONCLUSIONS

Updating the annotation database was successful, improving the Fmax prediction accuracy score for both PFP and ESG. Adding the prior distribution of GO terms did not make much improvement. Both of the ensemble methods we developed improved the average Fmax score over all individual component methods except for ESG. Our benchmark results will not only complement the overall assessment that will be done by the CAFA organizers, but also help elucidate the predictive powers of sequence-based function prediction methods in general.

摘要

背景

新蛋白质的功能注释是生物信息学的核心问题之一。随着基因组测序技术的不断发展，越来越多的序列信息可供分析和注释。为了实现快速自动的功能注释，人们开发了许多计算（自动化）功能预测（AFP）方法。为了大规模客观评估这些方法的性能，已开展了全社区范围的评估实验。功能注释关键评估（CAFA）实验的第二轮于2013 - 2014年举行。2014年在波士顿举行的分子生物学智能系统（ISMB）会议的一个特别兴趣小组会议上报告了对参与团队的评估情况。我们团队使用多种内部AFP方法参与了CAFA1和CAFA2。在此，我们报告在为CAFA2目标提交功能预测之前，在准备CAFA2的过程中我们的方法所获得的基准结果。

结果

对于CAFA2，我们更新了我们的方法（蛋白质功能预测（PFP）和扩展相似性组（ESG））所使用的注释数据库，并使用原始（旧的）和更新后的数据库对其功能预测性能进行基准测试。讨论了不同设置下PFP和ESG的性能评估。我们还开发了两种集成方法，将来自六种独立的基于序列的AFP方法的功能预测进行组合。我们通过用基因本体（GO）术语的先验分布丰富预测结果，进一步分析了我们预测方法的性能。讨论了集成方法的预测示例。

结论

注释数据库更新成功，提高了PFP和ESG的Fmax预测准确度得分。添加GO术语的先验分布没有带来太大改进。我们开发的两种集成方法都提高了所有单个组件方法（ESG除外）的平均Fmax得分。我们的基准结果不仅将补充CAFA组织者将进行的整体评估，还将有助于总体阐明基于序列的功能预测方法的预测能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5152/4570625/25f70e12a03c/13742_2015_83_Fig1_HTML.jpg

相似文献

The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches.

Gigascience. 2015 Sep 14;4:43. doi: 10.1186/s13742-015-0083-4. eCollection 2015.

In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-14-S3-S2. Epub 2013 Feb 28.

Using PFP and ESG Protein Function Prediction Web Servers.

Methods Mol Biol. 2017;1611:1-14. doi: 10.1007/978-1-4939-7015-5_1.

PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool.

Bioinformatics. 2015 Jan 15;31(2):271-2. doi: 10.1093/bioinformatics/btu646. Epub 2014 Oct 1.

ESG: extended similarity group method for automated protein function prediction.

Bioinformatics. 2009 Jul 15;25(14):1739-45. doi: 10.1093/bioinformatics/btp309. Epub 2009 May 12.

Evaluation of function predictions by PFP, ESG,and PSI-BLAST for moonlighting proteins.

BMC Proc. 2012 Nov 13;6 Suppl 7(Suppl 7):S5. doi: 10.1186/1753-6561-6-S7-S5.

Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences.

Bioinformatics. 2019 Mar 1;35(5):753-759. doi: 10.1093/bioinformatics/bty704.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.

Enhanced automated function prediction using distantly related sequences and contextual association by PFP.

Protein Sci. 2006 Jun;15(6):1550-6. doi: 10.1110/ps.062153506. Epub 2006 May 2.

A close look at protein function prediction evaluation protocols.

Gigascience. 2015 Sep 14;4:41. doi: 10.1186/s13742-015-0082-5. eCollection 2015.

引用本文的文献

Advanced Situation with Recombinant Toxins: Diversity, Production and Application Purposes.

Int J Mol Sci. 2023 Feb 27;24(5):4630. doi: 10.3390/ijms24054630.

ContactPFP: Protein function prediction using predicted contact information.

Front Bioinform. 2022 Jun;2. doi: 10.3389/fbinf.2022.896295. Epub 2022 Jun 2.

Proteomic profiling of hydatid fluid from pulmonary cystic echinococcosis.

Parasit Vectors. 2022 Mar 21;15(1):99. doi: 10.1186/s13071-022-05232-8.

NNTox: Gene Ontology-Based Protein Toxicity Prediction Using Neural Network.

Sci Rep. 2019 Nov 29;9(1):17923. doi: 10.1038/s41598-019-54405-6.

INGA 2.0: improving protein function prediction for the dark proteome.

Nucleic Acids Res. 2019 Jul 2;47(W1):W373-W378. doi: 10.1093/nar/gkz375.

BUSCA: an integrative web server to predict subcellular localization of proteins.

Nucleic Acids Res. 2018 Jul 2;46(W1):W459-W466. doi: 10.1093/nar/gky320.

本文引用的文献

Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0.

Bioinformatics. 2015 Mar 1;31(5):707-13. doi: 10.1093/bioinformatics/btu724. Epub 2014 Oct 29.

PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool.

Bioinformatics. 2015 Jan 15;31(2):271-2. doi: 10.1093/bioinformatics/btu646. Epub 2014 Oct 1.

Pfam: the protein families database.

Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.

Activities at the Universal Protein Resource (UniProt).

Nucleic Acids Res. 2014 Jan;42(Database issue):D191-8. doi: 10.1093/nar/gkt1140. Epub 2013 Nov 18.

FFPred 2.0: improved homology-independent prediction of gene ontology terms for eukaryotic protein sequences.

PLoS One. 2013 May 22;8(5):e63754. doi: 10.1371/journal.pone.0063754. Print 2013.

In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-14-S3-S2. Epub 2013 Feb 28.

A large-scale evaluation of computational protein function prediction.

Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.

Structure- and sequence-based function prediction for non-homologous proteins.

J Struct Funct Genomics. 2012 Jun;13(2):111-23. doi: 10.1007/s10969-012-9126-6. Epub 2012 Jan 22.

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Nat Methods. 2011 Dec 25;9(2):173-5. doi: 10.1038/nmeth.1818.

InterPro in 2011: new developments in the family and domain prediction database.

Nucleic Acids Res. 2012 Jan;40(Database issue):D306-12. doi: 10.1093/nar/gkr948. Epub 2011 Nov 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

2014年的PFP和ESG蛋白质功能预测方法：数据库更新和集成方法的影响。

The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译