• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DeepGOZero:基于本体论公理的序列和零样本学习改进蛋白质功能预测。

DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms.

机构信息

Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245. doi: 10.1093/bioinformatics/btac256.

DOI:10.1093/bioinformatics/btac256
PMID:35758802
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9235501/
Abstract

MOTIVATION

Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50 000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require a significant amount of training data and cannot make predictions for GO classes that have only few or no experimental annotations.

RESULTS

We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted.

AVAILABILITY AND IMPLEMENTATION

http://github.com/bio-ontology-research-group/deepgozero.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质的功能通常使用基因本体论(GO)来描述,GO 是一个由超过 50000 个类和一大组形式公理组成的本体。预测蛋白质的功能是计算生物学中的关键挑战之一,为此已经开发了各种机器学习方法。然而,这些方法通常需要大量的训练数据,并且不能对只有少数或没有实验注释的 GO 类进行预测。

结果

我们开发了 DeepGOZero,这是一种机器学习模型,可提高对注释数量较少或没有注释的功能的预测。为了实现这一目标,我们依赖于一种基于模型理论的学习本体嵌入方法,并将其与神经网络结合起来进行蛋白质功能预测。DeepGOZero 可以利用 GO 中的形式公理进行零样本预测,即即使在训练阶段没有一个蛋白质与该功能相关联,也可以预测蛋白质的功能。此外,DeepGOZero 采用的零样本预测方法是通用的,只要需要预测与本体类别的关联,就可以应用。

可用性和实现

http://github.com/bio-ontology-research-group/deepgozero。

补充信息

补充数据可在“Bioinformatics”在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/738d/9235501/dd3b5ba4b80b/btac256f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/738d/9235501/91c5539baa43/btac256f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/738d/9235501/d3887cb04972/btac256f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/738d/9235501/bd1bb544dc1e/btac256f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/738d/9235501/dd3b5ba4b80b/btac256f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/738d/9235501/91c5539baa43/btac256f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/738d/9235501/d3887cb04972/btac256f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/738d/9235501/bd1bb544dc1e/btac256f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/738d/9235501/dd3b5ba4b80b/btac256f4.jpg

相似文献

1
DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms.DeepGOZero:基于本体论公理的序列和零样本学习改进蛋白质功能预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245. doi: 10.1093/bioinformatics/btac256.
2
Formal axioms in biomedical ontologies improve analysis and interpretation of associated data.生物医学本体论中的形式公理可改善相关数据的分析和解释。
Bioinformatics. 2020 Apr 1;36(7):2229-2236. doi: 10.1093/bioinformatics/btz920.
3
Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations.Onto2Vec:基于向量的生物实体联合表示及其基于本体论的标注。
Bioinformatics. 2018 Jul 1;34(13):i52-i60. doi: 10.1093/bioinformatics/bty259.
4
mOWL: Python library for machine learning with biomedical ontologies.mOWL:用于生物医学本体机器学习的 Python 库。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac811.
5
DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.DeepGO:使用深度本体感知分类器从序列和相互作用预测蛋白质功能。
Bioinformatics. 2018 Feb 15;34(4):660-668. doi: 10.1093/bioinformatics/btx624.
6
Information theory applied to the sparse gene ontology annotation network to predict novel gene function.信息论应用于稀疏基因本体注释网络以预测新的基因功能。
Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.
7
GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank.GOLabeler:通过学习排序提高基于序列的大规模蛋白质功能预测。
Bioinformatics. 2018 Jul 15;34(14):2465-2473. doi: 10.1093/bioinformatics/bty130.
8
Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation.使用最大熵方法对 GO 本体论和 InterPro 注释进行共复合物蛋白成员评估。
Bioinformatics. 2018 Jun 1;34(11):1884-1892. doi: 10.1093/bioinformatics/btx803.
9
Transfer learning across ontologies for phenome-genome association prediction.跨本体的迁移学习用于表型-基因组关联预测。
Bioinformatics. 2017 Feb 15;33(4):529-536. doi: 10.1093/bioinformatics/btw649.
10
Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.基于蛋白质知识的 GO 注释预测的分层深度学习
Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.

引用本文的文献

1
Box embeddings for extending ontologies: a data-driven and interpretable approach.用于扩展本体的盒嵌入:一种数据驱动且可解释的方法。
J Cheminform. 2025 Sep 1;17(1):138. doi: 10.1186/s13321-025-01086-1.
2
MKFGO: integrating multi-source knowledge fusion with pretrained language model for high-accuracy protein function prediction.MKFGO:将多源知识融合与预训练语言模型相结合用于高精度蛋白质功能预测
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf420.
3
GOAnnotator: accurate protein function annotation using automatically retrieved literature.

本文引用的文献

1
OBO Foundry in 2021: operationalizing open data principles to evaluate ontologies.2021 年的 OBO 基金会:运用开放数据原则来评估本体论。
Database (Oxford). 2021 Oct 26;2021. doi: 10.1093/database/baab069.
2
Sequence-Based Prediction of Plant Protein-Protein Interactions by Combining Discrete Sine Transformation With Rotation Forest.基于离散正弦变换与旋转森林相结合的植物蛋白质-蛋白质相互作用的序列预测
Evol Bioinform Online. 2021 Oct 12;17:11769343211050067. doi: 10.1177/11769343211050067. eCollection 2021.
3
D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions.
GO注释器:利用自动检索的文献进行准确的蛋白质功能注释。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i410-i419. doi: 10.1093/bioinformatics/btaf199.
4
Out of distribution learning in bioinformatics: advancements and challenges.生物信息学中的分布外学习:进展与挑战
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf294.
5
ProtFun: A Protein Function Prediction Model Using Graph Attention Networks with a Protein Large Language Model.ProtFun:一种使用图注意力网络和蛋白质大语言模型的蛋白质功能预测模型。
bioRxiv. 2025 May 17:2025.05.13.653854. doi: 10.1101/2025.05.13.653854.
6
ProtNote: a multimodal method for protein-function annotation.ProtNote:一种用于蛋白质功能注释的多模态方法。
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf170.
7
Learning maximally spanning representations improves protein function annotation.学习最大生成表示可改善蛋白质功能注释。
bioRxiv. 2025 Feb 17:2025.02.13.638156. doi: 10.1101/2025.02.13.638156.
8
An experimental analysis of graph representation learning for Gene Ontology based protein function prediction.基于基因本体论的蛋白质功能预测的图表示学习的实验分析。
PeerJ. 2024 Nov 14;12:e18509. doi: 10.7717/peerj.18509. eCollection 2024.
9
FAPM: functional annotation of proteins using multimodal models beyond structural modeling.FAPM:使用超越结构建模的多模态模型对蛋白质进行功能注释。
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae680.
10
Progress and opportunities of foundation models in bioinformatics.生物信息学中基础模型的进展与机遇。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae548.
D-SCRIPT 通过基于序列、结构感知的基因组规模的蛋白质-蛋白质相互作用预测,将基因组转化为表型。
Cell Syst. 2021 Oct 20;12(10):969-982.e6. doi: 10.1016/j.cels.2021.08.010. Epub 2021 Oct 9.
4
Accurate prediction of protein structures and interactions using a three-track neural network.使用三轨神经网络准确预测蛋白质结构和相互作用。
Science. 2021 Aug 20;373(6557):871-876. doi: 10.1126/science.abj8754. Epub 2021 Jul 15.
5
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
6
DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction.DeepGraphGO:用于大规模多物种蛋白质功能预测的图神经网络。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i262-i271. doi: 10.1093/bioinformatics/btab270.
7
NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information.NetGO 2.0:利用大规模的序列、文本、结构域、家族和网络信息提高大规模蛋白质功能预测。
Nucleic Acids Res. 2021 Jul 2;49(W1):W469-W475. doi: 10.1093/nar/gkab398.
8
TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding.TALE:基于 Transformer 的蛋白质功能注释与联合序列-标签嵌入。
Bioinformatics. 2021 Sep 29;37(18):2825-2833. doi: 10.1093/bioinformatics/btab198.
9
Semantic similarity and machine learning with ontologies.语义相似性和本体论的机器学习。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa199.
10
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.CAFA 挑战赛报告称,通过实验筛选,提高了数百个基因的蛋白质功能预测和新的功能注释。
Genome Biol. 2019 Nov 19;20(1):244. doi: 10.1186/s13059-019-1835-8.