深入研究蛋白质功能预测评估方案。

A close look at protein function prediction evaluation protocols.

作者信息

Kahanda Indika, Funk Christopher S, Ullah Fahad, Verspoor Karin M, Ben-Hur Asa

机构信息

Department of Computer Science, Colorado State University, Fort Collins, 80523 CO USA.

Computational Bioscience Program, University of Colorado School of Medicine, Aurora, 80045 CO USA.

出版信息

Gigascience. 2015 Sep 14;4:41. doi: 10.1186/s13742-015-0082-5. eCollection 2015.

DOI:10.1186/s13742-015-0082-5

PMID:26380075

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4570743/

Abstract

BACKGROUND

The recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which participants were asked to submit predictions for proteins with no existing annotations. The CAFA2 task is more realistic, in that it more closely mimics the accumulation of annotations over time. In this study we compare these tasks in terms of their difficulty, and determine whether cross-validation provides a good estimate of performance.

RESULTS

The CAFA2 task is a combination of two subtasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In this study we analyze the performance of several function prediction methods in these two scenarios. Our results show that several methods (structured support vector machine, binary support vector machines and guilt-by-association methods) do not usually achieve the same level of accuracy on these two tasks as that achieved by cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We also find that different methods have different performance characteristics in these tasks, and that cross-validation is not adequate at estimating performance and ranking methods.

CONCLUSIONS

These results have implications for the design of computational experiments in the area of automated function prediction and can provide useful insight for the understanding and design of future CAFA competitions.

摘要

背景

最近举办的功能注释关键评估挑战赛（CAFA2）要求参与者对大量目标蛋白质提交预测结果，无论这些蛋白质之前是否已有注释。这与最初的CAFA挑战赛不同，在最初的挑战赛中，要求参与者对没有现有注释的蛋白质提交预测结果。CAFA2任务更贴近现实，因为它更紧密地模拟了注释随时间的积累。在本研究中，我们从难度方面比较了这些任务，并确定交叉验证是否能很好地估计性能。

结果

CAFA2任务是两个子任务的组合：对有注释的蛋白质进行预测以及对之前未注释的蛋白质进行预测。在本研究中，我们分析了几种功能预测方法在这两种情况下的性能。我们的结果表明，几种方法（结构化支持向量机、二元支持向量机和关联有罪方法）在这两个任务上通常无法达到与交叉验证相同的准确率水平，并且对之前已有注释的蛋白质预测新注释比预测未表征蛋白质的注释更难。我们还发现不同方法在这些任务中有不同的性能特征，并且交叉验证在估计性能和对方法进行排名方面并不充分。

结论

这些结果对自动功能预测领域的计算实验设计有影响，并且能为理解和设计未来的CAFA竞赛提供有用的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b66/4570743/0f5ae5f4f56c/13742_2015_82_Fig1_HTML.jpg

相似文献

A close look at protein function prediction evaluation protocols.

Gigascience. 2015 Sep 14;4:41. doi: 10.1186/s13742-015-0082-5. eCollection 2015.

The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches.

Gigascience. 2015 Sep 14;4:43. doi: 10.1186/s13742-015-0083-4. eCollection 2015.

Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge.

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S14. doi: 10.1186/1471-2105-14-S3-S14. Epub 2013 Feb 28.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.

Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.

In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-14-S3-S2. Epub 2013 Feb 28.

The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective.

Bioinformatics. 2014 Sep 1;30(17):i609-16. doi: 10.1093/bioinformatics/btu472.

Assigning protein function from domain-function associations using DomFun.

BMC Bioinformatics. 2022 Jan 15;23(1):43. doi: 10.1186/s12859-022-04565-6.

Using PFP and ESG Protein Function Prediction Web Servers.

Methods Mol Biol. 2017;1611:1-14. doi: 10.1007/978-1-4939-7015-5_1.

Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants.

Proteins. 2018 Feb;86(2):135-151. doi: 10.1002/prot.25416. Epub 2017 Nov 29.

引用本文的文献

MEGA-GO: functions prediction of diverse protein sequence length using Multi-scalE Graph Adaptive neural network.

Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf032.

PANNZER-A practical tool for protein function prediction.

Protein Sci. 2022 Jan;31(1):118-128. doi: 10.1002/pro.4193. Epub 2021 Oct 14.

Automatic Gene Function Prediction in the 2020's.

Genes (Basel). 2020 Oct 27;11(11):1264. doi: 10.3390/genes11111264.

Graph2GO: a multi-modal attributed network embedding method for inferring protein functions.

Gigascience. 2020 Aug 1;9(8). doi: 10.1093/gigascience/giaa081.

Supervised learning is an accurate method for network-based gene classification.

Bioinformatics. 2020 Jun 1;36(11):3457-3465. doi: 10.1093/bioinformatics/btaa150.

Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences.

PLoS Comput Biol. 2019 Nov 4;15(11):e1007419. doi: 10.1371/journal.pcbi.1007419. eCollection 2019 Nov.

Blinded Testing of Function Annotation for uPE1 Proteins by I-TASSER/COFACTOR Pipeline Using the 2018-2019 Additions to neXtProt and the CAFA3 Challenge.

J Proteome Res. 2019 Dec 6;18(12):4154-4166. doi: 10.1021/acs.jproteome.9b00537. Epub 2019 Oct 18.

Improving protein function prediction using protein sequence and GO-term similarities.

Bioinformatics. 2019 Apr 1;35(7):1116-1124. doi: 10.1093/bioinformatics/bty751.

Functional Annotations of Paralogs: A Blessing and a Curse.

Life (Basel). 2016 Sep 8;6(3):39. doi: 10.3390/life6030039.

本文引用的文献

Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.

J Biomed Semantics. 2015 Mar 18;6:9. doi: 10.1186/s13326-015-0006-4. eCollection 2015.

Predicting protein functions using incomplete hierarchical labels.

BMC Bioinformatics. 2015 Jan 16;16:1. doi: 10.1186/s12859-014-0430-y.

The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective.

Bioinformatics. 2014 Sep 1;30(17):i609-16. doi: 10.1093/bioinformatics/btu472.

The COMBREX project: design, methodology, and initial results.

PLoS Biol. 2013;11(8):e1001638. doi: 10.1371/journal.pbio.1001638. Epub 2013 Aug 27.

Biases in the experimental annotations of protein function and their effect on our understanding of protein function space.

PLoS Comput Biol. 2013;9(5):e1003063. doi: 10.1371/journal.pcbi.1003063. Epub 2013 May 30.

Combining heterogeneous data sources for accurate functional annotation of proteins.

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2105-14-S3-S10. Epub 2013 Feb 28.

A large-scale evaluation of computational protein function prediction.

Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.

The BioGRID interaction database: 2013 update.

Nucleic Acids Res. 2013 Jan;41(Database issue):D816-23. doi: 10.1093/nar/gks1158. Epub 2012 Nov 30.

"Guilt by association" is the exception rather than the rule in gene networks.

PLoS Comput Biol. 2012;8(3):e1002444. doi: 10.1371/journal.pcbi.1002444. Epub 2012 Mar 29.

Analysis of protein function and its prediction from amino acid sequence.

Proteins. 2011 Jul;79(7):2086-96. doi: 10.1002/prot.23029. Epub 2011 Apr 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

深入研究蛋白质功能预测评估方案。

A close look at protein function prediction evaluation protocols.

作者信息

Kahanda Indika, Funk Christopher S, Ullah Fahad, Verspoor Karin M, Ben-Hur Asa

机构信息

Department of Computer Science, Colorado State University, Fort Collins, 80523 CO USA.

Computational Bioscience Program, University of Colorado School of Medicine, Aurora, 80045 CO USA.