Suppr超能文献

DeepGOZero:基于本体论公理的序列和零样本学习改进蛋白质功能预测。

DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms.

机构信息

Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245. doi: 10.1093/bioinformatics/btac256.

Abstract

MOTIVATION

Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50 000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require a significant amount of training data and cannot make predictions for GO classes that have only few or no experimental annotations.

RESULTS

We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted.

AVAILABILITY AND IMPLEMENTATION

http://github.com/bio-ontology-research-group/deepgozero.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质的功能通常使用基因本体论(GO)来描述,GO 是一个由超过 50000 个类和一大组形式公理组成的本体。预测蛋白质的功能是计算生物学中的关键挑战之一,为此已经开发了各种机器学习方法。然而,这些方法通常需要大量的训练数据,并且不能对只有少数或没有实验注释的 GO 类进行预测。

结果

我们开发了 DeepGOZero,这是一种机器学习模型,可提高对注释数量较少或没有注释的功能的预测。为了实现这一目标,我们依赖于一种基于模型理论的学习本体嵌入方法,并将其与神经网络结合起来进行蛋白质功能预测。DeepGOZero 可以利用 GO 中的形式公理进行零样本预测,即即使在训练阶段没有一个蛋白质与该功能相关联,也可以预测蛋白质的功能。此外,DeepGOZero 采用的零样本预测方法是通用的,只要需要预测与本体类别的关联,就可以应用。

可用性和实现

http://github.com/bio-ontology-research-group/deepgozero。

补充信息

补充数据可在“Bioinformatics”在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/738d/9235501/91c5539baa43/btac256f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验