• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HiG2Vec:庞加莱球中基因本体论和基因的层次表示。

HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball.

机构信息

Department of Computer Engineering, Ajou University, Suwon 16499, South Korea.

Department of Biostatistics, Epidemiology & Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

出版信息

Bioinformatics. 2021 Sep 29;37(18):2971-2980. doi: 10.1093/bioinformatics/btab193.

DOI:10.1093/bioinformatics/btab193
PMID:33760022
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10060726/
Abstract

MOTIVATION

Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincaré balls are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature.

RESULTS

In this article, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincaré embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. It can be robustly applied to manipulate various biological knowledge.

AVAILABILITYAND IMPLEMENTATION

https://github.com/JaesikKim/HiG2Vec.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

GO(Gene Ontology)和 GOA(Gene Ontology Annotation)的知识操作主要可以通过使用 GO 术语和基因的向量表示来完成。先前的研究已经在欧几里得空间中表示 GO 术语和基因或基因产物,使用基于 Word2Vec 的方法等嵌入方法来测量它们的语义相似性,以将实体表示为数字向量。然而,这种方法存在的限制是,在欧几里得空间中嵌入大型图结构数据不能防止潜在层次结构信息的丢失,从而不能最佳地捕获 GO 和 GOA 的语义。另一方面,双曲空间(如 Poincaré 球)更适合建模层次结构,因为它们具有几何性质,即由于负曲率,距离在接近边界时呈指数级增加。

结果

在本文中,我们通过应用专门用于通过两步过程(GO 嵌入和基因嵌入)表示层次结构的 Poincaré 嵌入,提出了 GO 和基因的层次表示(HiG2Vec)。通过实验,我们表明我们的模型比其他方法更好地表示层次结构,并预测基因或基因产物的相互作用与先前的研究相似或更好。结果表明,HiG2Vec 在捕获 GO 和基因语义以及数据利用方面优于其他方法。它可以稳健地应用于操纵各种生物知识。

可用性和实现

https://github.com/JaesikKim/HiG2Vec。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0eb/10060726/09724894d344/btab193f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0eb/10060726/4e692e431486/btab193f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0eb/10060726/7b85ad61ffe2/btab193f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0eb/10060726/09724894d344/btab193f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0eb/10060726/4e692e431486/btab193f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0eb/10060726/7b85ad61ffe2/btab193f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0eb/10060726/09724894d344/btab193f3.jpg

相似文献

1
HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball.HiG2Vec:庞加莱球中基因本体论和基因的层次表示。
Bioinformatics. 2021 Sep 29;37(18):2971-2980. doi: 10.1093/bioinformatics/btab193.
2
Hyperbolic hierarchical knowledge graph embeddings for biological entities.用于生物实体的双曲分层知识图谱嵌入
J Biomed Inform. 2023 Nov;147:104503. doi: 10.1016/j.jbi.2023.104503. Epub 2023 Sep 29.
3
Anc2vec: embedding gene ontology terms by preserving ancestors relationships.Anc2vec:通过保留祖先关系来嵌入基因本体论术语。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac003.
4
Isoform function prediction by Gene Ontology embedding.通过基因本体论嵌入进行同工型功能预测。
Bioinformatics. 2022 Sep 30;38(19):4581-4588. doi: 10.1093/bioinformatics/btac576.
5
Learning representations for gene ontology terms by jointly encoding graph structure and textual node descriptors.通过联合编码图结构和文本节点描述符来学习基因本体论术语的表示。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac318.
6
Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations.Onto2Vec:基于向量的生物实体联合表示及其基于本体论的标注。
Bioinformatics. 2018 Jul 1;34(13):i52-i60. doi: 10.1093/bioinformatics/bty259.
7
GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.GO 功能相似性聚类取决于相似性度量、聚类方法和注释完整性。
BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2.
8
Integration of background knowledge for automatic detection of inconsistencies in gene ontology annotation.背景知识的整合用于自动检测基因本体论注释中的不一致性。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i390-i400. doi: 10.1093/bioinformatics/btae246.
9
Measure the Semantic Similarity of GO Terms Using Aggregate Information Content.使用聚合信息内容测量基因本体术语的语义相似性。
IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):468-76. doi: 10.1109/TCBB.2013.176.
10
Interspecies gene function prediction using semantic similarity.基于语义相似性的跨物种基因功能预测
BMC Syst Biol. 2016 Dec 23;10(Suppl 4):121. doi: 10.1186/s12918-016-0361-5.

引用本文的文献

1
GeOKG: geometry-aware knowledge graph embedding for Gene Ontology and genes.GeOKG:用于基因本体论和基因的几何感知知识图谱嵌入
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf160.
2
Product Manifold Representations for Learning on Biological Pathways.用于生物途径学习的产物流形表示
ArXiv. 2025 Feb 4:arXiv:2401.15478v2.
3
An experimental analysis of graph representation learning for Gene Ontology based protein function prediction.基于基因本体论的蛋白质功能预测的图表示学习的实验分析。

本文引用的文献

1
From genome to phenome: Predicting multiple cancer phenotypes based on somatic genomic alterations via the genomic impact transformer.从基因组到表型:通过基因组影响转化器基于体细胞基因组改变预测多种癌症表型。
Pac Symp Biocomput. 2020;25:79-90.
2
Meta-Analysis of Gene Expression and Identification of Biological Regulatory Mechanisms in Alzheimer's Disease.阿尔茨海默病基因表达的荟萃分析及生物学调控机制的鉴定
Front Neurosci. 2019 Jul 3;13:633. doi: 10.3389/fnins.2019.00633. eCollection 2019.
3
g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update).
PeerJ. 2024 Nov 14;12:e18509. doi: 10.7717/peerj.18509. eCollection 2024.
4
Partial order relation-based gene ontology embedding improves protein function prediction.基于偏序关系的本体论嵌入可提高蛋白质功能预测。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae077.
5
Subcellular location of source proteins improves prediction of neoantigens for immunotherapy.源蛋白的亚细胞定位可提高免疫治疗中新抗原的预测。
EMBO J. 2022 Dec 15;41(24):e111071. doi: 10.15252/embj.2022111071. Epub 2022 Oct 31.
6
Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function.对53000个小鼠模型的表型进行降维分析,揭示了基因功能的多样图景。
Bioinform Adv. 2021 Oct 11;1(1):vbab026. doi: 10.1093/bioadv/vbab026. eCollection 2021.
g:Profiler:一个用于功能富集分析和基因列表转换的网络服务器(2019 更新)。
Nucleic Acids Res. 2019 Jul 2;47(W1):W191-W198. doi: 10.1093/nar/gkz369.
4
MOSClip: multi-omic and survival pathway analysis for the identification of survival associated gene and modules.MOSClip:用于识别生存相关基因和模块的多组学和生存途径分析。
Nucleic Acids Res. 2019 Aug 22;47(14):e80. doi: 10.1093/nar/gkz324.
5
Robust pathway-based multi-omics data integration using directed random walks for survival prediction in multiple cancer studies.基于有向随机游走的稳健通路多组学数据整合用于多种癌症研究的生存预测。
Biol Direct. 2019 Apr 29;14(1):8. doi: 10.1186/s13062-019-0239-8.
6
Gene2vec: distributed representation of genes based on co-expression.Gene2vec:基于共表达的基因分布式表示。
BMC Genomics. 2019 Feb 4;20(Suppl 1):82. doi: 10.1186/s12864-018-5370-x.
7
Embedding of Genes Using Cancer Gene Expression Data: Biological Relevance and Potential Application on Biomarker Discovery.利用癌症基因表达数据进行基因嵌入:生物学相关性及在生物标志物发现中的潜在应用
Front Genet. 2019 Jan 4;9:682. doi: 10.3389/fgene.2018.00682. eCollection 2018.
8
STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.STRING v11:具有增强覆盖范围的蛋白质-蛋白质相互作用网络,支持在全基因组实验数据集的功能发现。
Nucleic Acids Res. 2019 Jan 8;47(D1):D607-D613. doi: 10.1093/nar/gky1131.
9
Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes.语义疾病基因嵌入物(SmuDGE):基于表型的疾病基因优先排序,无需表型。
Bioinformatics. 2018 Sep 1;34(17):i901-i907. doi: 10.1093/bioinformatics/bty559.
10
OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction.OPA2Vec:结合生物医学本体的正式和非正式内容以改进基于相似度的预测。
Bioinformatics. 2019 Jun 1;35(12):2133-2140. doi: 10.1093/bioinformatics/bty933.