School of Computer Science and Engineering, Central South University, Changsha 410083, China.
Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore.
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii190-ii197. doi: 10.1093/bioinformatics/btae386.
Effective molecular representation is critical in drug development. The complex nature of molecules demands comprehensive multi-view representations, considering 1D, 2D, and 3D aspects, to capture diverse perspectives. Obtaining representations that encompass these varied structures is crucial for a holistic understanding of molecules in drug-related contexts.
In this study, we introduce an innovative multi-view contrastive learning framework for molecular representation, denoted as MolMVC. Initially, we use a Transformer encoder to capture 1D sequence information and a Graph Transformer to encode the intricate 2D and 3D structural details of molecules. Our approach incorporates a novel attention-guided augmentation scheme, leveraging prior knowledge to create positive samples tailored to different molecular data views. To align multi-view molecular positive samples effectively in latent space, we introduce an adaptive multi-view contrastive loss (AMCLoss). In particular, we calculate AMCLoss at various levels within the model to effectively capture the hierarchical nature of the molecular information. Eventually, we pre-train the encoders via minimizing AMCLoss to obtain the molecular representation, which can be used for various down-stream tasks. In our experiments, we evaluate the performance of our MolMVC on multiple tasks, including molecular property prediction (MPP), drug-target binding affinity (DTA) prediction and cancer drug response (CDR) prediction. The results demonstrate that the molecular representation learned by our MolMVC can enhance the predictive accuracy on these tasks and also reduce the computational costs. Furthermore, we showcase MolMVC's efficacy in drug repositioning across a spectrum of drug-related applications.
The code and pre-trained model are publicly available at https://github.com/Hhhzj-7/MolMVC.
有效的分子表示在药物开发中至关重要。分子的复杂性质要求全面的多视图表示,考虑到 1D、2D 和 3D 方面,以捕捉不同的视角。获得包含这些不同结构的表示对于在药物相关上下文中全面理解分子是至关重要的。
在这项研究中,我们引入了一种用于分子表示的创新多视图对比学习框架,称为 MolMVC。最初,我们使用 Transformer 编码器来捕获 1D 序列信息,使用图 Transformer 来编码分子的复杂 2D 和 3D 结构细节。我们的方法采用了一种新颖的注意力引导增强方案,利用先验知识来创建针对不同分子数据视图的正样本。为了在潜在空间中有效地对齐多视图分子正样本,我们引入了自适应多视图对比损失(AMCLoss)。特别是,我们在模型的不同级别计算 AMCLoss,以有效地捕捉分子信息的层次性质。最终,我们通过最小化 AMCLoss 来预先训练编码器,以获得可用于各种下游任务的分子表示。在我们的实验中,我们评估了 MolMVC 在多个任务上的性能,包括分子性质预测(MPP)、药物-靶标结合亲和力(DTA)预测和癌症药物反应(CDR)预测。结果表明,我们的 MolMVC 学习到的分子表示可以提高这些任务的预测准确性,并降低计算成本。此外,我们展示了 MolMVC 在一系列药物相关应用中的药物重定位中的功效。
代码和预训练模型可在 https://github.com/Hhhzj-7/MolMVC 上公开获取。