• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Onto2Vec:基于向量的生物实体联合表示及其基于本体论的标注。

Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations.

机构信息

Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i52-i60. doi: 10.1093/bioinformatics/bty259.

DOI:10.1093/bioinformatics/bty259
PMID:29949999
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6022543/
Abstract

MOTIVATION

Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. The structure and information contained in ontologies and their annotations make them valuable for developing machine learning, data analysis and knowledge extraction algorithms; notably, semantic similarity is widely used to identify relations between biological entities, and ontology-based annotations are frequently used as features in machine learning applications.

RESULTS

We propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. To evaluate Onto2Vec, we use the gene ontology (GO) and jointly produce dense vector representations of proteins, the GO classes to which they are annotated, and the axioms in GO that constrain these classes. First, we demonstrate that Onto2Vec-generated feature vectors can significantly improve prediction of protein-protein interactions in human and yeast. We then illustrate how Onto2Vec representations provide the means for constructing data-driven, trainable semantic similarity measures that can be used to identify particular relations between proteins. Finally, we use an unsupervised clustering approach to identify protein families based on their Enzyme Commission numbers. Our results demonstrate that Onto2Vec can generate high quality feature vectors from biological entities and ontologies. Onto2Vec has the potential to significantly outperform the state-of-the-art in several predictive applications in which ontologies are involved.

AVAILABILITY AND IMPLEMENTATION

https://github.com/bio-ontology-research-group/onto2vec.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

生物知识广泛以基于本体的注释形式表示:本体描述了假定存在于一个领域中的现象,注释将(某种)生物实体与该领域内的一组现象联系起来。本体及其注释中包含的结构和信息使其成为开发机器学习、数据分析和知识提取算法的宝贵资源;特别是,语义相似性被广泛用于识别生物实体之间的关系,并且本体注释经常被用作机器学习应用中的特征。

结果

我们提出了 Onto2Vec 方法,这是一种基于生物实体对生物医学本体的注释来学习特征向量的方法。我们的方法可以应用于广泛的生物信息学研究问题,例如基于相似性预测蛋白质之间的相互作用、使用监督学习对相互作用类型进行分类,或聚类。为了评估 Onto2Vec,我们使用基因本体 (GO) 并共同生成蛋白质、它们被注释的 GO 类以及约束这些类的 GO 公理的密集向量表示。首先,我们证明 Onto2Vec 生成的特征向量可以显著提高人类和酵母中蛋白质-蛋白质相互作用的预测。然后,我们说明了 Onto2Vec 表示如何为构建基于数据的、可训练的语义相似性度量提供手段,该度量可用于识别蛋白质之间的特定关系。最后,我们使用无监督聚类方法根据它们的酶委员会编号识别蛋白质家族。我们的结果表明,Onto2Vec 可以从生物实体和本体中生成高质量的特征向量。Onto2Vec 有可能在涉及本体的几个预测应用中显著优于最新技术。

可用性和实现

https://github.com/bio-ontology-research-group/onto2vec。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40f2/6022543/4f6fecf665c3/bty259f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40f2/6022543/4df327951532/bty259f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40f2/6022543/2fef32654bb7/bty259f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40f2/6022543/a0fb8906f0b4/bty259f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40f2/6022543/4f6fecf665c3/bty259f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40f2/6022543/4df327951532/bty259f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40f2/6022543/2fef32654bb7/bty259f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40f2/6022543/a0fb8906f0b4/bty259f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40f2/6022543/4f6fecf665c3/bty259f4.jpg

相似文献

1
Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations.Onto2Vec:基于向量的生物实体联合表示及其基于本体论的标注。
Bioinformatics. 2018 Jul 1;34(13):i52-i60. doi: 10.1093/bioinformatics/bty259.
2
OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction.OPA2Vec:结合生物医学本体的正式和非正式内容以改进基于相似度的预测。
Bioinformatics. 2019 Jun 1;35(12):2133-2140. doi: 10.1093/bioinformatics/bty933.
3
Formal axioms in biomedical ontologies improve analysis and interpretation of associated data.生物医学本体论中的形式公理可改善相关数据的分析和解释。
Bioinformatics. 2020 Apr 1;36(7):2229-2236. doi: 10.1093/bioinformatics/btz920.
4
mOWL: Python library for machine learning with biomedical ontologies.mOWL:用于生物医学本体机器学习的 Python 库。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac811.
5
GoVec: Gene Ontology Representation Learning Using Weighted Heterogeneous Graph and Meta-Path.GoVec:基于加权异质图和元路径的基因本体表示学习。
J Comput Biol. 2021 Dec;28(12):1196-1207. doi: 10.1089/cmb.2021.0069. Epub 2021 Nov 29.
6
A relation based measure of semantic similarity for Gene Ontology annotations.一种基于关系的基因本体注释语义相似度度量方法。
BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468.
7
Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.通过结合基因本体注释和基因共功能网络来测量语义相似性。
BMC Bioinformatics. 2015 Feb 14;16:44. doi: 10.1186/s12859-015-0474-7.
8
Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation.使用最大熵方法对 GO 本体论和 InterPro 注释进行共复合物蛋白成员评估。
Bioinformatics. 2018 Jun 1;34(11):1884-1892. doi: 10.1093/bioinformatics/btx803.
9
Inferring ontology graph structures using OWL reasoning.利用 owl 推理推断本体图结构。
BMC Bioinformatics. 2018 Jan 5;19(1):7. doi: 10.1186/s12859-017-1999-8.
10
Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.Hum-mPLoc 3.0:通过对基因本体和功能域特征的隐藏相关性进行建模来增强人类蛋白质亚细胞定位预测
Bioinformatics. 2017 Mar 15;33(6):843-853. doi: 10.1093/bioinformatics/btw723.

引用本文的文献

1
OntoTiger: a platform of ontology-based application tools for integrative biomedical exploration.OntoTiger:一个用于综合生物医学探索的基于本体的应用工具平台。
Nucleic Acids Res. 2025 Jul 7;53(W1):W440-W450. doi: 10.1093/nar/gkaf337.
2
GeOKG: geometry-aware knowledge graph embedding for Gene Ontology and genes.GeOKG:用于基因本体论和基因的几何感知知识图谱嵌入
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf160.
3
Generating unseen diseases patient data using ontology enhanced generative adversarial networks.

本文引用的文献

1
Inferring ontology graph structures using OWL reasoning.利用 owl 推理推断本体图结构。
BMC Bioinformatics. 2018 Jan 5;19(1):7. doi: 10.1186/s12859-017-1999-8.
2
Neuro-symbolic representation learning on biological knowledge graphs.生物知识图谱上的神经符号表示学习。
Bioinformatics. 2017 Sep 1;33(17):2723-2730. doi: 10.1093/bioinformatics/btx275.
3
Semantic prioritization of novel causative genomic variants.新型致病基因组变异的语义优先级排序。
使用本体增强生成对抗网络生成未见疾病患者数据。
NPJ Digit Med. 2025 Jan 3;8(1):4. doi: 10.1038/s41746-024-01421-0.
4
An experimental analysis of graph representation learning for Gene Ontology based protein function prediction.基于基因本体论的蛋白质功能预测的图表示学习的实验分析。
PeerJ. 2024 Nov 14;12:e18509. doi: 10.7717/peerj.18509. eCollection 2024.
5
Partial order relation-based gene ontology embedding improves protein function prediction.基于偏序关系的本体论嵌入可提高蛋白质功能预测。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae077.
6
Multi-ontology embeddings approach on human-aligned multi-ontologies representation for gene-disease associations prediction.用于基因-疾病关联预测的人类对齐多本体表示的多本体嵌入方法。
Heliyon. 2023 Oct 30;9(11):e21502. doi: 10.1016/j.heliyon.2023.e21502. eCollection 2023 Nov.
7
Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE).使用基于 Transformer 的方法构建组合注释实体的搜索工具:Biosimulation Model Search Engine (BMSE) 的案例研究。
F1000Res. 2023 Feb 10;12:162. doi: 10.12688/f1000research.128982.1. eCollection 2023.
8
Contribution of model organism phenotypes to the computational identification of human disease genes.模式生物表型对计算鉴定人类疾病基因的贡献。
Dis Model Mech. 2022 Jul 1;15(7). doi: 10.1242/dmm.049441. Epub 2022 Aug 3.
9
InfersentPPI: Prediction of Protein-Protein Interaction Using Protein Sentence Embedding With Gene Ontology Information.InfersentPPI:利用带有基因本体信息的蛋白质句子嵌入预测蛋白质-蛋白质相互作用
Front Genet. 2022 Mar 28;13:827540. doi: 10.3389/fgene.2022.827540. eCollection 2022.
10
Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications.将生物医学知识图谱和文本相结合,提高药物-靶点相互作用和药物适应症的预测能力。
PeerJ. 2022 Apr 4;10:e13061. doi: 10.7717/peerj.13061. eCollection 2022.
PLoS Comput Biol. 2017 Apr 17;13(4):e1005500. doi: 10.1371/journal.pcbi.1005500. eCollection 2017 Apr.
4
Evaluating the effect of annotation size on measures of semantic similarity.评估注释大小对语义相似性度量的影响。
J Biomed Semantics. 2017 Feb 13;8(1):7. doi: 10.1186/s13326-017-0119-z.
5
The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible.2017年的STRING数据库:质量可控的蛋白质-蛋白质相互作用网络,广泛可用。
Nucleic Acids Res. 2017 Jan 4;45(D1):D362-D368. doi: 10.1093/nar/gkw937. Epub 2016 Oct 18.
6
The role of ontologies in biological and biomedical research: a functional perspective.本体论在生物学和生物医学研究中的作用:功能视角
Brief Bioinform. 2015 Nov;16(6):1069-80. doi: 10.1093/bib/bbv011. Epub 2015 Apr 10.
7
Improved exome prioritization of disease genes through cross-species phenotype comparison.通过跨物种表型比较提高疾病基因外显子组优先级。
Genome Res. 2014 Feb;24(2):340-8. doi: 10.1101/gr.160325.113. Epub 2013 Oct 25.
8
Combining heterogeneous data sources for accurate functional annotation of proteins.整合异构数据源以实现蛋白质功能注释的准确性。
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S10. doi: 10.1186/1471-2105-14-S3-S10. Epub 2013 Feb 28.
9
A gene ontology inferred from molecular networks.从分子网络推断出的基因本体论。
Nat Biotechnol. 2013 Jan;31(1):38-45. doi: 10.1038/nbt.2463.
10
Gene Ontology annotations and resources.基因本体论注释和资源。
Nucleic Acids Res. 2013 Jan;41(Database issue):D530-5. doi: 10.1093/nar/gks1050. Epub 2012 Nov 17.