College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China.
School of Computer Science, National University of Defense Technology, Changsha 410073, China.
Bioinformatics. 2021 Dec 11;37(24):4793-4800. doi: 10.1093/bioinformatics/btab565.
Predicting entity relationship can greatly benefit important biomedical problems. Recently, a large amount of biomedical heterogeneous networks (BioHNs) are generated and offer opportunities for developing network-based learning approaches to predict relationships among entities. However, current researches slightly explored BioHNs-based self-supervised representation learning methods, and are hard to simultaneously capturing local- and global-level association information among entities.
In this study, we propose a BioHN-based self-supervised representation learning approach for entity relationship predictions, termed BioERP. A self-supervised meta path detection mechanism is proposed to train a deep Transformer encoder model that can capture the global structure and semantic feature in BioHNs. Meanwhile, a biomedical entity mask learning strategy is designed to reflect local associations of vertices. Finally, the representations from different task models are concatenated to generate two-level representation vectors for predicting relationships among entities. The results on eight datasets show BioERP outperforms 30 state-of-the-art methods. In particular, BioERP reveals great performance with results close to 1 in terms of AUC and AUPR on the drug-target interaction predictions. In summary, BioERP is a promising bio-entity relationship prediction approach.
Source code and data can be downloaded from https://github.com/pengsl-lab/BioERP.git.
Supplementary data are available at Bioinformatics online.
预测实体关系可以极大地有益于重要的生物医学问题。最近,大量的生物医学异质网络(BioHNs)被生成,并为开发基于网络的学习方法来预测实体之间的关系提供了机会。然而,目前的研究很少探索基于 BioHNs 的自监督表示学习方法,并且难以同时捕获实体之间的局部和全局关联信息。
在这项研究中,我们提出了一种基于 BioHN 的自监督表示学习方法,用于实体关系预测,称为 BioERP。提出了一种自监督元路径检测机制,用于训练深度 Transformer 编码器模型,该模型可以捕获 BioHNs 中的全局结构和语义特征。同时,设计了一种生物医学实体掩蔽学习策略来反映顶点的局部关联。最后,来自不同任务模型的表示被连接起来,为预测实体之间的关系生成两级表示向量。在八个数据集上的结果表明,BioERP 优于 30 种最先进的方法。特别是,在药物-靶标相互作用预测方面,BioERP 在 AUC 和 AUPR 方面的表现非常出色,结果接近 1。总之,BioERP 是一种很有前途的生物实体关系预测方法。
源代码和数据可从 https://github.com/pengsl-lab/BioERP.git 下载。
补充数据可在生物信息学在线获得。