描述计算基因功能分配方面的最新技术状态：从第一次功能注释（CAFA）的关键评估中吸取的教训。

Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA).

机构信息

Stanley Institute for Cognitive Genomic, Cold Spring Harbor Laboratory, 196 Genome Research Center, 500 Sunnyside Boulevard Woodbury, NY 11797, USA.

出版信息

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S15. doi: 10.1186/1471-2105-14-s3-s15.

DOI:10.1186/1471-2105-14-s3-s15

PMID:23630983

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3633048/

Abstract

The assignment of gene function remains a difficult but important task in computational biology. The establishment of the first Critical Assessment of Functional Annotation (CAFA) was aimed at increasing progress in the field. We present an independent analysis of the results of CAFA, aimed at identifying challenges in assessment and at understanding trends in prediction performance. We found that well-accepted methods based on sequence similarity (i.e., BLAST) have a dominant effect. Many of the most informative predictions turned out to be either recovering existing knowledge about sequence similarity or were "post-dictions" already documented in the literature. These results indicate that deep challenges remain in even defining the task of function assignment, with a particular difficulty posed by the problem of defining function in a way that is not dependent on either flawed gold standards or the input data itself. In particular, we suggest that using the Gene Ontology (or other similar systematizations of function) as a gold standard is unlikely to be the way forward.

摘要

基因功能的分配仍然是计算生物学中一项具有挑战性但很重要的任务。首次进行关键功能注释评估（CAFA）的目的是为了增加该领域的进展。我们对 CAFA 的结果进行了独立分析，旨在确定评估中的挑战，并了解预测性能的趋势。我们发现，基于序列相似性的被广泛接受的方法（即 BLAST）具有主导作用。许多最具信息量的预测结果要么是恢复了关于序列相似性的现有知识，要么是已经在文献中记录的“后预测”。这些结果表明，即使在定义功能分配任务方面，仍然存在深刻的挑战，特别是在以不依赖有缺陷的黄金标准或输入数据本身的方式定义功能方面存在困难。特别是，我们建议将基因本体论（或其他类似的功能分类系统）作为黄金标准不太可能是前进的方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/322b/3633048/f67ec9ba8b32/1471-2105-14-S3-S15-1.jpg

相似文献

Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA).描述计算基因功能分配方面的最新技术状态：从第一次功能注释（CAFA）的关键评估中吸取的教训。

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S15. doi: 10.1186/1471-2105-14-s3-s15.

Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge.基于生物医学文献中提取的文本特征进行蛋白质功能预测：CAFA 挑战赛。

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S14. doi: 10.1186/1471-2105-14-S3-S14. Epub 2013 Feb 28.

In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.在 CAFA 2011 实验中深入评估 PFP 和 ESG 基于序列的功能预测方法的性能。

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-14-S3-S2. Epub 2013 Feb 28.

A large-scale evaluation of computational protein function prediction.大规模计算蛋白质功能预测评估。

Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.对蛋白质功能预测方法的扩展评估显示准确性有所提高。

Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.

Protein function prediction by massive integration of evolutionary analyses and multiple data sources.通过大规模整合进化分析和多种数据源进行蛋白质功能预测。

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S1. doi: 10.1186/1471-2105-14-S3-S1. Epub 2013 Feb 28.

Combining heterogeneous data sources for accurate functional annotation of proteins.整合异构数据源以实现蛋白质功能注释的准确性。

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2105-14-S3-S10. Epub 2013 Feb 28.

Measuring gene functional similarity based on group-wise comparison of GO terms.基于 GO 术语的组间比较来衡量基因功能相似性。

Bioinformatics. 2013 Jun 1;29(11):1424-32. doi: 10.1093/bioinformatics/btt160. Epub 2013 Apr 9.

MS-kNN: protein function prediction by integrating multiple data sources.MS-kNN：整合多数据源的蛋白质功能预测

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S8. doi: 10.1186/1471-2105-14-S3-S8. Epub 2013 Feb 28.

PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment.PANNZER：在易出错环境中对未表征蛋白质进行高通量功能注释。

Bioinformatics. 2015 May 15;31(10):1544-52. doi: 10.1093/bioinformatics/btu851. Epub 2015 Jan 8.

引用本文的文献

Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI).基于人工智能的本体动态检索增强生成（DRAGON-AI）。

J Biomed Semantics. 2024 Oct 17;15(1):19. doi: 10.1186/s13326-024-00320-3.

Fine-tuning protein embeddings for functional similarity evaluation.调整蛋白质嵌入以进行功能相似性评估。

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae445.

ProteInfer, deep neural networks for protein functional inference.蛋白推断，用于蛋白质功能推断的深度神经网络。

Elife. 2023 Feb 27;12:e80942. doi: 10.7554/eLife.80942.

Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction.将无监督语言模型与三重态神经网络集成，用于蛋白质基因本体预测。

PLoS Comput Biol. 2022 Dec 22;18(12):e1010793. doi: 10.1371/journal.pcbi.1010793. eCollection 2022 Dec.

TripletGO: Integrating Transcript Expression Profiles with Protein Homology Inferences for Gene Function Prediction.三重 GO：整合转录表达谱与蛋白质同源推断进行基因功能预测。

Genomics Proteomics Bioinformatics. 2022 Oct;20(5):1013-1027. doi: 10.1016/j.gpb.2022.03.001. Epub 2022 May 11.

Multi-tissue DNA methylation microarray signature is predictive of gene function.多组织 DNA 甲基化微阵列特征可预测基因功能。

Epigenetics. 2022 Nov;17(11):1404-1418. doi: 10.1080/15592294.2022.2036411. Epub 2022 Feb 13.

PANNZER-A practical tool for protein function prediction.PANNZER——一种用于蛋白质功能预测的实用工具。

Protein Sci. 2022 Jan;31(1):118-128. doi: 10.1002/pro.4193. Epub 2021 Oct 14.

The language of proteins: NLP, machine learning & protein sequences.蛋白质的语言：自然语言处理、机器学习与蛋白质序列

Comput Struct Biotechnol J. 2021 Mar 25;19:1750-1758. doi: 10.1016/j.csbj.2021.03.022. eCollection 2021.

SDN2GO: An Integrated Deep Learning Model for Protein Function Prediction.SDN2GO：一种用于蛋白质功能预测的集成深度学习模型。

Front Bioeng Biotechnol. 2020 Apr 29;8:391. doi: 10.3389/fbioe.2020.00391. eCollection 2020.

Novel comparison of evaluation metrics for gene ontology classifiers reveals drastic performance differences.新型基因本体论分类器评估指标比较揭示出显著的性能差异。

PLoS Comput Biol. 2019 Nov 4;15(11):e1007419. doi: 10.1371/journal.pcbi.1007419. eCollection 2019 Nov.

本文引用的文献

Progress and challenges in the computational prediction of gene function using networks.利用网络进行基因功能计算预测的进展与挑战

F1000Res. 2012 Sep 7;1:14. doi: 10.12688/f1000research.1-14.v1. eCollection 2012.

A large-scale evaluation of computational protein function prediction.大规模计算蛋白质功能预测评估。

Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.

Wisdom of crowds for robust gene network inference.群体智慧在稳健基因网络推断中的应用。

Nat Methods. 2012 Jul 15;9(8):796-804. doi: 10.1038/nmeth.2016.

Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms.Argot2：一个大规模的功能预测工具，依赖于加权基因本体术语的语义相似性。

BMC Bioinformatics. 2012 Mar 28;13 Suppl 4(Suppl 4):S14. doi: 10.1186/1471-2105-13-S4-S14.

"Guilt by association" is the exception rather than the rule in gene networks.“关联定罪”在基因网络中是例外而非常规。

PLoS Comput Biol. 2012;8(3):e1002444. doi: 10.1371/journal.pcbi.1002444. Epub 2012 Mar 29.

On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report.利用基因本体论注释评估直系同源物和旁系同源物之间的功能相似性：简短报告。

PLoS Comput Biol. 2012;8(2):e1002386. doi: 10.1371/journal.pcbi.1002386. Epub 2012 Feb 16.

The UniProt-GO Annotation database in 2011.2011 年的 UniProt-GO Annotation 数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D565-70. doi: 10.1093/nar/gkr1048. Epub 2011 Nov 28.

The role of indirect connections in gene networks in predicting function.基因网络中间接连接在预测功能中的作用。

Bioinformatics. 2011 Jul 1;27(13):1860-6. doi: 10.1093/bioinformatics/btr288. Epub 2011 May 6.

EcoCyc: a comprehensive database of Escherichia coli biology.EcoCyc：大肠杆菌生物学综合数据库。

Nucleic Acids Res. 2011 Jan;39(Database issue):D583-90. doi: 10.1093/nar/gkq1143. Epub 2010 Nov 21.

QuickGO: a web-based tool for Gene Ontology searching.QuickGO：一个基于网络的基因本体论搜索工具。

Bioinformatics. 2009 Nov 15;25(22):3045-6. doi: 10.1093/bioinformatics/btp536. Epub 2009 Sep 10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

描述计算基因功能分配方面的最新技术状态：从第一次功能注释（CAFA）的关键评估中吸取的教训。

Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA).

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献