Suppr超能文献

用于生物实体的双曲分层知识图谱嵌入

Hyperbolic hierarchical knowledge graph embeddings for biological entities.

作者信息

Li Nan, Yang Zhihao, Yang Yumeng, Wang Jian, Lin Hongfei

机构信息

College of Computer Science and Technology, Dalian University of Technology, Dalian, China.

College of Computer Science and Technology, Dalian University of Technology, Dalian, China.

出版信息

J Biomed Inform. 2023 Nov;147:104503. doi: 10.1016/j.jbi.2023.104503. Epub 2023 Sep 29.

Abstract

Predicting relationships between biological entities can greatly benefit important biomedical problems. Previous studies have attempted to represent biological entities and relationships in Euclidean space using embedding methods, which evaluate their semantic similarity by representing entities as numerical vectors. However, the limitation of these methods is that they cannot prevent the loss of latent hierarchical information when embedding large graph-structured data into Euclidean space, and therefore cannot capture the semantics of entities and relationships accurately. Hyperbolic spaces, such as Poincaré ball, are better suited for hierarchical modeling than Euclidean spaces. This is because hyperbolic spaces exhibit negative curvature, causing distances to grow exponentially as they approach the boundary. In this paper, we propose HEM, a hyperbolic hierarchical knowledge graph embedding model to generate vector representations of bio-entities. By encoding the entities and relations in the hyperbolic space, HEM can capture latent hierarchical information and improve the accuracy of biological entity representation. Notably, HEM can preserve rich information with a low dimension compared with the methods that encode entities in Euclidean space. Furthermore, we explore the performance of HEM in protein-protein interaction prediction and gene-disease association prediction tasks. Experimental results demonstrate the superior performance of HEM over state-of-the-art baselines. The data and code are available at : https://github.com/Nan-ll/HEM.

摘要

预测生物实体之间的关系对重要的生物医学问题大有裨益。先前的研究尝试使用嵌入方法在欧几里得空间中表示生物实体和关系,这些方法通过将实体表示为数值向量来评估它们的语义相似性。然而,这些方法的局限性在于,当将大型图结构数据嵌入欧几里得空间时,它们无法防止潜在层次信息的丢失,因此无法准确捕捉实体和关系的语义。双曲空间,如庞加莱球,比欧几里得空间更适合进行层次建模。这是因为双曲空间呈现负曲率,导致距离在接近边界时呈指数增长。在本文中,我们提出了HEM,一种双曲层次知识图嵌入模型,用于生成生物实体的向量表示。通过在双曲空间中对实体和关系进行编码,HEM可以捕捉潜在的层次信息并提高生物实体表示的准确性。值得注意的是,与在欧几里得空间中对实体进行编码的方法相比,HEM可以用低维度保留丰富的信息。此外,我们探索了HEM在蛋白质-蛋白质相互作用预测和基因-疾病关联预测任务中的性能。实验结果表明,HEM的性能优于现有最先进的基线方法。数据和代码可在以下网址获取:https://github.com/Nan-ll/HEM

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验