Kang Chuanze, Liu Zonghuan, Zhang Han
College of Artificial Intelligence, Nankai University, Tianjin 300350, China.
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf023.
The drug-disease, gene-disease, and drug-gene relationships, as high-frequency edge types, describe complex biological processes within the biomedical knowledge graph. The structural patterns formed by these three edges are the graph motifs of (disease, drug, gene) triplets. Among them, the triangle is a steady and important motif structure in the network, and other various motifs different from the triangle also indicate rich semantic relationships. However, existing methods only focus on the triangle representation learning for classification, and fail to further discriminate various motifs of triplets. A comprehensive method is needed to predict the various motifs within triplets, which will uncover new pharmacological mechanisms and improve our understanding of disease-gene-drug interactions. Identifying complex motif structures within triplets can also help us to study the structural properties of triangles.
We consider the seven typical motifs within the triplets and propose a novel graph contrastive learning-based method for triplet motif prediction (TriMoGCL). TriMoGCL utilizes a graph convolutional encoder to extract node features from the global network topology. Next, node pooling and edge pooling extract context information as the triplet features from global and local views. To avoid the redundant context information and motif imbalance problem caused by dense edges, we use node and class-prototype contrastive learning to denoise triplet features and enhance discrimination between motifs. The experiments on two different-scale knowledge graphs demonstrate the effectiveness and reliability of TriMoGCL in identifying various motif types. In addition, our model reveals new pharmacological mechanisms, providing a comprehensive analysis of triplet motifs.
Codes and datasets are available at https://github.com/zhanglabNKU/TriMoGCL and https://doi.org/10.5281/zenodo.14633572.
药物-疾病、基因-疾病和药物-基因关系作为高频边类型,描述了生物医学知识图谱中的复杂生物过程。由这三种边形成的结构模式是(疾病、药物、基因)三元组的图基序。其中,三角形是网络中稳定且重要的基序结构,其他不同于三角形的各种基序也表示丰富的语义关系。然而,现有方法仅专注于用于分类的三角形表示学习,未能进一步区分三元组的各种基序。需要一种综合方法来预测三元组内的各种基序,这将揭示新的药理机制并增进我们对疾病-基因-药物相互作用的理解。识别三元组内的复杂基序结构也有助于我们研究三角形的结构特性。
我们考虑了三元组内的七种典型基序,并提出了一种基于图对比学习的新型三元组基序预测方法(TriMoGCL)。TriMoGCL利用图卷积编码器从全局网络拓扑中提取节点特征。接下来,节点池化和边池化从全局和局部视图中提取上下文信息作为三元组特征。为避免由密集边导致的冗余上下文信息和基序不平衡问题,我们使用节点和类原型对比学习对三元组特征进行去噪并增强基序之间的区分度。在两个不同规模的知识图谱上进行的实验证明了TriMoGCL在识别各种基序类型方面的有效性和可靠性。此外,我们的模型揭示了新的药理机制,对三元组基序进行了全面分析。
代码和数据集可在https://github.com/zhanglabNKU/TriMoGCL和https://doi.org/10.5281/zenodo.14633572获取。