用于药物性质预测的多模态融合深度学习：整合化学语言和分子图

Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph.

作者信息

Lu Xiaohua, Xie Liangxu, Xu Lei, Mao Rongzhi, Xu Xiaojun, Chang Shan

机构信息

Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou 213001, China.

出版信息

Comput Struct Biotechnol J. 2024 Apr 12;23:1666-1679. doi: 10.1016/j.csbj.2024.04.030. eCollection 2024 Dec.

DOI:10.1016/j.csbj.2024.04.030

PMID:38680871

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11046066/

Abstract

Accurately predicting molecular properties is a challenging but essential task in drug discovery. Recently, many mono-modal deep learning methods have been successfully applied to molecular property prediction. However, mono-modal learning is inherently limited as it relies solely on a single modality of molecular representation, which restricts a comprehensive understanding of drug molecules. To overcome the limitations, we propose a multimodal fused deep learning (MMFDL) model to leverage information from different molecular representations. Specifically, we construct a triple-modal learning model by employing Transformer-Encoder, Bidirectional Gated Recurrent Unit (BiGRU), and graph convolutional network (GCN) to process three modalities of information from chemical language and molecular graph: SMILES-encoded vectors, ECFP fingerprints, and molecular graphs, respectively. We evaluate the proposed triple-modal model using five fusion approaches on six molecule datasets, including Delaney, Llinas2020, Lipophilicity, SAMPL, BACE, and pKa from DataWarrior. The results show that the MMFDL model achieves the highest Pearson coefficients, and stable distribution of Pearson coefficients in the random splitting test, outperforming mono-modal models in accuracy and reliability. Furthermore, we validate the generalization ability of our model in the prediction of binding constants for protein-ligand complex molecules, and assess the resilience capability against noise. Through analysis of feature distributions in chemical space and the assigned contribution of each modal model, we demonstrate that the MMFDL model shows the ability to acquire complementary information by using proper models and suitable fusion approaches. By leveraging diverse sources of bioinformatics information, multimodal deep learning models hold the potential for successful drug discovery.

摘要

准确预测分子性质是药物研发中一项具有挑战性但又至关重要的任务。最近，许多单模态深度学习方法已成功应用于分子性质预测。然而，单模态学习本质上存在局限性，因为它仅依赖于分子表示的单一模态，这限制了对药物分子的全面理解。为了克服这些局限性，我们提出了一种多模态融合深度学习（MMFDL）模型，以利用来自不同分子表示的信息。具体而言，我们通过使用Transformer-Encoder、双向门控循环单元（BiGRU）和图卷积网络（GCN）构建了一个三模态学习模型，分别处理来自化学语言和分子图的三种信息模态：SMILES编码向量、ECFP指纹和分子图。我们使用五种融合方法在六个分子数据集上评估了所提出的三模态模型，这些数据集包括来自DataWarrior的Delaney、Llinas2020、亲脂性、SAMPL、BACE和pKa。结果表明，MMFDL模型在随机拆分测试中获得了最高的皮尔逊系数，且皮尔逊系数分布稳定，在准确性和可靠性方面优于单模态模型。此外，我们验证了我们的模型在预测蛋白质-配体复合分子结合常数方面的泛化能力，并评估了其抗噪声能力。通过分析化学空间中的特征分布以及每个模态模型的指定贡献，我们证明了MMFDL模型通过使用适当的模型和合适的融合方法显示出获取互补信息的能力。通过利用多种生物信息学信息来源，多模态深度学习模型在成功的药物研发中具有潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b07/11046066/0cfef1a7ca6a/ga1.jpg

相似文献

Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph.用于药物性质预测的多模态融合深度学习：整合化学语言和分子图

Comput Struct Biotechnol J. 2024 Apr 12;23:1666-1679. doi: 10.1016/j.csbj.2024.04.030. eCollection 2024 Dec.

An improved multi-modal representation-learning model based on fusion networks for property prediction in drug discovery.一种基于融合网络的改进多模态表征学习模型，用于药物发现中的性质预测。

Comput Biol Med. 2023 Oct;165:107452. doi: 10.1016/j.compbiomed.2023.107452. Epub 2023 Sep 9.

Drug-target affinity prediction with extended graph learning-convolutional networks.基于扩展图学习卷积网络的药物-靶标亲和力预测。

BMC Bioinformatics. 2024 Feb 16;25(1):75. doi: 10.1186/s12859-024-05698-6.

MolPROP: Molecular Property prediction with multimodal language and graph fusion.MolPROP：通过多模态语言与图形融合进行分子属性预测。

J Cheminform. 2024 May 22;16(1):56. doi: 10.1186/s13321-024-00846-9.

Molecular Joint Representation Learning via Multi-Modal Information of SMILES and Graphs.通过SMILES和图的多模态信息进行分子联合表示学习

IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):3044-3055. doi: 10.1109/TCBB.2023.3253862. Epub 2023 Oct 9.

MCL-DTI: using drug multimodal information and bi-directional cross-attention learning method for predicting drug-target interaction.MCL-DTI：使用药物多模态信息和双向交叉注意力学习方法预测药物-靶标相互作用。

BMC Bioinformatics. 2023 Aug 26;24(1):323. doi: 10.1186/s12859-023-05447-1.

MAMF-GCN: Multi-scale adaptive multi-channel fusion deep graph convolutional network for predicting mental disorder.MAMF-GCN：用于预测精神障碍的多尺度自适应多通道融合深度图卷积网络。

Comput Biol Med. 2022 Sep;148:105823. doi: 10.1016/j.compbiomed.2022.105823. Epub 2022 Jul 6.

Hierarchical multimodal self-attention-based graph neural network for DTI prediction.基于分层多模态自注意力的图神经网络用于 DTI 预测。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae293.

MMGCN: Multi-modal multi-view graph convolutional networks for cancer prognosis prediction.多模态多视图图卷积网络用于癌症预后预测。

Comput Methods Programs Biomed. 2024 Dec;257:108400. doi: 10.1016/j.cmpb.2024.108400. Epub 2024 Sep 6.

MLSFF: Multi-level structural features fusion for multi-modal knowledge graph completion.MLSFF：用于多模态知识图谱补全的多层次结构特征融合

Math Biosci Eng. 2023 Jun 25;20(8):14096-14116. doi: 10.3934/mbe.2023630.

本文引用的文献

Decision tree-based identification of important molecular fragments for protein-ligand binding.基于决策树的蛋白质-配体结合中重要分子片段的识别。

Chem Biol Drug Des. 2024 Jan;103(1):e14427. doi: 10.1111/cbdd.14427.

Chemprop: A Machine Learning Package for Chemical Property Prediction.Chemprop：一个用于化学性质预测的机器学习工具包。

J Chem Inf Model. 2024 Jan 8;64(1):9-17. doi: 10.1021/acs.jcim.3c01250. Epub 2023 Dec 26.

A Multimodal Deep Learning Framework for Predicting PPI-Modulator Interactions.一种用于预测蛋白质-蛋白质相互作用调节剂相互作用的多模态深度学习框架。

J Chem Inf Model. 2023 Dec 11;63(23):7363-7372. doi: 10.1021/acs.jcim.3c01527. Epub 2023 Dec 1.

A systematic study of key elements underlying molecular property prediction.对分子性质预测背后关键要素的系统研究。

Nat Commun. 2023 Oct 13;14(1):6395. doi: 10.1038/s41467-023-41948-6.

Artificial Intelligence for Drug Discovery: Are We There Yet?人工智能在药物研发中的应用：我们是否已经实现？

Annu Rev Pharmacol Toxicol. 2024 Jan 23;64:527-550. doi: 10.1146/annurev-pharmtox-040323-040828. Epub 2023 Sep 22.

Artificial intelligence for natural product drug discovery.人工智能在天然产物药物发现中的应用。

Nat Rev Drug Discov. 2023 Nov;22(11):895-916. doi: 10.1038/s41573-023-00774-7. Epub 2023 Sep 11.

Development of a 2D-QSAR Model for Tissue-to-Plasma Partition Coefficient Value with High Accuracy Using Machine Learning Method, Minimum Required Experimental Values, and Physicochemical Descriptors.基于机器学习方法、最小必需实验值和物理化学描述符，开发高精度的二维定量构效关系模型，用于预测组织-血浆分配系数值。

Eur J Drug Metab Pharmacokinet. 2023 Jul;48(4):341-352. doi: 10.1007/s13318-023-00832-w. Epub 2023 Jun 2.

Multi-modality attribute learning-based method for drug-protein interaction prediction based on deep neural network.基于深度学习的多模态属性学习药物-蛋白质相互作用预测方法。

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad161.

p─A Rigorous Indicator of Individual Functional Group Acidity/Basicity in Multiprotic Compounds.p─多质子化合物中个体功能基团酸碱性的严格指标。

J Chem Inf Model. 2023 May 22;63(10):3198-3208. doi: 10.1021/acs.jcim.3c00187. Epub 2023 Apr 27.

Computational approaches streamlining drug discovery.计算方法简化药物发现。

Nature. 2023 Apr;616(7958):673-685. doi: 10.1038/s41586-023-05905-z. Epub 2023 Apr 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于药物性质预测的多模态融合深度学习：整合化学语言和分子图

Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献