基于蛋白质知识的 GO 注释预测的分层深度学习

Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.

机构信息

Bioengineering and Bioinformatics Research and Development Institute (IBB), FI-UNER, CONICET, Oro Verde 3100, Argentina.

Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina.

出版信息

Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.

DOI:10.1093/bioinformatics/btac536

PMID:35929781

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9524999/

Abstract

MOTIVATION

Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they are expensive, time-consuming and cannot cope with the exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill the gap with automatic function prediction. The results of the last Critical Assessment of Function Annotation challenge revealed that GO-terms prediction remains a very challenging task. Recent developments on deep learning are significantly breaking out the frontiers leading to new knowledge in protein research thanks to the integration of data from multiple sources. However, deep models hitherto developed for functional prediction are mainly focused on sequence data and have not achieved breakthrough performances yet.

RESULTS

We propose DeeProtGO, a novel deep-learning model for predicting GO annotations by integrating protein knowledge. DeeProtGO was trained for solving 18 different prediction problems, defined by the three GO sub-ontologies, the type of proteins, and the taxonomic kingdom. Our experiments reported higher prediction quality when more protein knowledge is integrated. We also benchmarked DeeProtGO against state-of-the-art methods on public datasets, and showed it can effectively improve the prediction of GO annotations.

AVAILABILITY AND IMPLEMENTATION

DeeProtGO and a case of use are available at https://github.com/gamerino/DeeProtGO.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

实验测试和人工注释是为蛋白质功能分配描述基因本体论 (GO) 术语的最精确方法。然而，它们既昂贵又耗时，并且无法应对高通量测序方法生成的数据的指数级增长。因此，研究人员需要可靠的计算系统来帮助填补自动功能预测的空白。上一次功能注释评估挑战赛的结果表明，GO 术语预测仍然是一项极具挑战性的任务。深度学习的最新发展通过整合来自多个来源的数据，大大突破了导致蛋白质研究新知识的前沿。然而，迄今为止为功能预测开发的深度模型主要侧重于序列数据，并且尚未取得突破性的性能。

结果

我们提出了 DeeProtGO，这是一种通过整合蛋白质知识来预测 GO 注释的新型深度学习模型。DeeProtGO 经过训练可解决 18 种不同的预测问题，这些问题由三个 GO 子本体、蛋白质类型和分类单元定义。当整合更多蛋白质知识时，我们报告了更高的预测质量。我们还在公共数据集上针对最先进的方法对 DeeProtGO 进行了基准测试，并表明它可以有效地改进 GO 注释的预测。

可用性和实现

DeeProtGO 和一个用例可在 https://github.com/gamerino/DeeProtGO 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5252/9524999/5f86db6bcb3d/btac536f1.jpg

相似文献

Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.

Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.

exp2GO: Improving Prediction of Functions in the Gene Ontology With Expression Data.

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):999-1008. doi: 10.1109/TCBB.2022.3167245. Epub 2023 Apr 3.

PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships.

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad094.

Mutual annotation-based prediction of protein domain functions with Domain2GO.

Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.

DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

Bioinformatics. 2018 Feb 15;34(4):660-668. doi: 10.1093/bioinformatics/btx624.

Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations.

Bioinformatics. 2018 Jul 1;34(13):i52-i60. doi: 10.1093/bioinformatics/bty259.

Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation.

Bioinformatics. 2018 Jun 1;34(11):1884-1892. doi: 10.1093/bioinformatics/btx803.

Improving protein function prediction using protein sequence and GO-term similarities.

Bioinformatics. 2019 Apr 1;35(7):1116-1124. doi: 10.1093/bioinformatics/bty751.

Protein Function Prediction With Functional and Topological Knowledge of Gene Ontology.

IEEE Trans Nanobioscience. 2023 Oct;22(4):755-762. doi: 10.1109/TNB.2023.3278033. Epub 2023 Oct 3.

Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae568.

引用本文的文献

The CABANA model 2017-2022: research and training synergy to facilitate bioinformatics applications in Latin America.

Front Educ (Lausanne). 2024 Jul 4;9. doi: 10.3389/feduc.2024.1358620.

Optimizing Scorpion Toxin Processing through Artificial Intelligence.

Toxins (Basel). 2024 Oct 11;16(10):437. doi: 10.3390/toxins16100437.

Osmoprotectants play a major role in the resistance to high levels of salinity stress-insights from a metabolomics and proteomics integrated approach.

Front Plant Sci. 2023 Jun 13;14:1187803. doi: 10.3389/fpls.2023.1187803. eCollection 2023.

PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships.

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad094.

Elucidating the functional roles of prokaryotic proteins using big data and artificial intelligence.

FEMS Microbiol Rev. 2023 Jan 16;47(1). doi: 10.1093/femsre/fuad003.

本文引用的文献

Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14.

Proteins. 2022 Jan;90(1):58-72. doi: 10.1002/prot.26186. Epub 2021 Jul 27.

DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i262-i271. doi: 10.1093/bioinformatics/btab270.

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding.

Bioinformatics. 2021 Sep 29;37(18):2825-2833. doi: 10.1093/bioinformatics/btab198.

Embeddings from deep learning transfer GO annotations beyond homology.

Sci Rep. 2021 Jan 13;11(1):1160. doi: 10.1038/s41598-020-80786-0.

Automatic Gene Function Prediction in the 2020's.

Genes (Basel). 2020 Oct 27;11(11):1264. doi: 10.3390/genes11111264.

Deep learning for mining protein data.

Brief Bioinform. 2021 Jan 18;22(1):194-218. doi: 10.1093/bib/bbz156.

Complexity measures of the mature miRNA for improving pre-miRNAs prediction.

Bioinformatics. 2020 Apr 15;36(8):2319-2327. doi: 10.1093/bioinformatics/btz940.

Modeling aspects of the language of life through transfer-learning protein sequences.

BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Genome Biol. 2019 Nov 19;20(1):244. doi: 10.1186/s13059-019-1835-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于蛋白质知识的 GO 注释预测的分层深度学习

Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献