Suppr超能文献

基于蛋白质知识的 GO 注释预测的分层深度学习

Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.

机构信息

Bioengineering and Bioinformatics Research and Development Institute (IBB), FI-UNER, CONICET, Oro Verde 3100, Argentina.

Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina.

出版信息

Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.

Abstract

MOTIVATION

Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they are expensive, time-consuming and cannot cope with the exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill the gap with automatic function prediction. The results of the last Critical Assessment of Function Annotation challenge revealed that GO-terms prediction remains a very challenging task. Recent developments on deep learning are significantly breaking out the frontiers leading to new knowledge in protein research thanks to the integration of data from multiple sources. However, deep models hitherto developed for functional prediction are mainly focused on sequence data and have not achieved breakthrough performances yet.

RESULTS

We propose DeeProtGO, a novel deep-learning model for predicting GO annotations by integrating protein knowledge. DeeProtGO was trained for solving 18 different prediction problems, defined by the three GO sub-ontologies, the type of proteins, and the taxonomic kingdom. Our experiments reported higher prediction quality when more protein knowledge is integrated. We also benchmarked DeeProtGO against state-of-the-art methods on public datasets, and showed it can effectively improve the prediction of GO annotations.

AVAILABILITY AND IMPLEMENTATION

DeeProtGO and a case of use are available at https://github.com/gamerino/DeeProtGO.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

实验测试和人工注释是为蛋白质功能分配描述基因本体论 (GO) 术语的最精确方法。然而,它们既昂贵又耗时,并且无法应对高通量测序方法生成的数据的指数级增长。因此,研究人员需要可靠的计算系统来帮助填补自动功能预测的空白。上一次功能注释评估挑战赛的结果表明,GO 术语预测仍然是一项极具挑战性的任务。深度学习的最新发展通过整合来自多个来源的数据,大大突破了导致蛋白质研究新知识的前沿。然而,迄今为止为功能预测开发的深度模型主要侧重于序列数据,并且尚未取得突破性的性能。

结果

我们提出了 DeeProtGO,这是一种通过整合蛋白质知识来预测 GO 注释的新型深度学习模型。DeeProtGO 经过训练可解决 18 种不同的预测问题,这些问题由三个 GO 子本体、蛋白质类型和分类单元定义。当整合更多蛋白质知识时,我们报告了更高的预测质量。我们还在公共数据集上针对最先进的方法对 DeeProtGO 进行了基准测试,并表明它可以有效地改进 GO 注释的预测。

可用性和实现

DeeProtGO 和一个用例可在 https://github.com/gamerino/DeeProtGO 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5252/9524999/5f86db6bcb3d/btac536f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验