College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.
School of Computer Science, Xiangtan University, Xiangtan, 411105, Hunan, China.
BMC Biol. 2024 Oct 29;22(1):248. doi: 10.1186/s12915-024-02049-y.
Accurate prediction of compound-protein interaction (CPI) plays a crucial role in drug discovery. Existing data-driven methods aim to learn from the chemical structures of compounds and proteins yet ignore the conceptual knowledge that is the interrelationships among the fundamental elements in the biomedical knowledge graph (KG). Knowledge graphs provide a comprehensive view of entities and relationships beyond individual compounds and proteins. They encompass a wealth of information like pathways, diseases, and biological processes, offering a richer context for CPI prediction. This contextual information can be used to identify indirect interactions, infer potential relationships, and improve prediction accuracy. In real-world applications, the prevalence of knowledge-missing compounds and proteins is a critical barrier for injecting knowledge into data-driven models.
Here, we propose BEACON, a data and knowledge dual-driven framework that bridges chemical structure and conceptual knowledge for CPI prediction. The proposed BEACON learns the consistent representations by maximizing the mutual information between chemical structure and conceptual knowledge and predicts the missing representations by minimizing their conditional entropy. BEACON achieves state-of-the-art performance on multiple datasets compared to competing methods, notably with 5.1% and 6.6% performance gain on the BIOSNAP and DrugBank datasets, respectively. Moreover, BEACON is the only approach capable of effectively predicting knowledge representations for knowledge-lacking compounds and proteins.
Overall, our work provides a general approach for directly injecting conceptual knowledge to enhance the performance of CPI prediction.
准确预测化合物-蛋白质相互作用(CPI)在药物发现中起着至关重要的作用。现有的基于数据的方法旨在从化合物和蛋白质的化学结构中学习,但忽略了生物医学知识图(KG)中基本元素之间的概念知识。知识图谱提供了对实体和关系的全面了解,超越了单个化合物和蛋白质。它们包含丰富的信息,如途径、疾病和生物过程,为 CPI 预测提供了更丰富的背景。这种上下文信息可用于识别间接相互作用、推断潜在关系和提高预测准确性。在实际应用中,知识缺失的化合物和蛋白质的普遍性是将知识注入数据驱动模型的一个关键障碍。
在这里,我们提出了 BEACON,这是一个数据和知识双重驱动的框架,用于为 CPI 预测桥接化学结构和概念知识。所提出的 BEACON 通过最大化化学结构和概念知识之间的互信息来学习一致的表示,并通过最小化它们的条件熵来预测缺失的表示。与竞争方法相比,BEACON 在多个数据集上实现了最先进的性能,在 BIOSNAP 和 DrugBank 数据集上分别实现了 5.1%和 6.6%的性能提升。此外,BEACON 是唯一能够有效预测知识缺失的化合物和蛋白质的知识表示的方法。
总的来说,我们的工作提供了一种直接注入概念知识以增强 CPI 预测性能的通用方法。