Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245. doi: 10.1093/bioinformatics/btac256.
Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50 000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require a significant amount of training data and cannot make predictions for GO classes that have only few or no experimental annotations.
We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted.
http://github.com/bio-ontology-research-group/deepgozero.
Supplementary data are available at Bioinformatics online.
蛋白质的功能通常使用基因本体论(GO)来描述,GO 是一个由超过 50000 个类和一大组形式公理组成的本体。预测蛋白质的功能是计算生物学中的关键挑战之一,为此已经开发了各种机器学习方法。然而,这些方法通常需要大量的训练数据,并且不能对只有少数或没有实验注释的 GO 类进行预测。
我们开发了 DeepGOZero,这是一种机器学习模型,可提高对注释数量较少或没有注释的功能的预测。为了实现这一目标,我们依赖于一种基于模型理论的学习本体嵌入方法,并将其与神经网络结合起来进行蛋白质功能预测。DeepGOZero 可以利用 GO 中的形式公理进行零样本预测,即即使在训练阶段没有一个蛋白质与该功能相关联,也可以预测蛋白质的功能。此外,DeepGOZero 采用的零样本预测方法是通用的,只要需要预测与本体类别的关联,就可以应用。
http://github.com/bio-ontology-research-group/deepgozero。
补充数据可在“Bioinformatics”在线获取。