Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China.
Center for Applied Mathematics of Guangxi, Nanning Normal University, 508 Xinning Road, Wuming District, Nanning 530100, China.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae298.
Effective molecular representation learning is very important for Artificial Intelligence-driven Drug Design because it affects the accuracy and efficiency of molecular property prediction and other molecular modeling relevant tasks. However, previous molecular representation learning studies often suffer from limitations, such as over-reliance on a single molecular representation, failure to fully capture both local and global information in molecular structure, and ineffective integration of multiscale features from different molecular representations. These limitations restrict the complete and accurate representation of molecular structure and properties, ultimately impacting the accuracy of predicting molecular properties. To this end, we propose a novel multi-view molecular representation learning method called MvMRL, which can incorporate feature information from multiple molecular representations and capture both local and global information from different views well, thus improving molecular property prediction. Specifically, MvMRL consists of four parts: a multiscale CNN-SE Simplified Molecular Input Line Entry System (SMILES) learning component and a multiscale Graph Neural Network encoder to extract local feature information and global feature information from the SMILES view and the molecular graph view, respectively; a Multi-Layer Perceptron network to capture complex non-linear relationship features from the molecular fingerprint view; and a dual cross-attention component to fuse feature information on the multi-views deeply for predicting molecular properties. We evaluate the performance of MvMRL on 11 benchmark datasets, and experimental results show that MvMRL outperforms state-of-the-art methods, indicating its rationality and effectiveness in molecular property prediction. The source code of MvMRL was released in https://github.com/jedison-github/MvMRL.
有效的分子表示学习对于人工智能驱动的药物设计非常重要,因为它影响分子性质预测和其他分子建模相关任务的准确性和效率。然而,以前的分子表示学习研究往往存在局限性,例如过度依赖单一的分子表示,无法充分捕捉分子结构中的局部和全局信息,以及无法有效地整合来自不同分子表示的多尺度特征。这些限制限制了分子结构和性质的完整和准确表示,最终影响了分子性质预测的准确性。为此,我们提出了一种新的多视图分子表示学习方法,称为 MvMRL,它可以合并来自多个分子表示的特征信息,并很好地捕捉不同视图的局部和全局信息,从而提高分子性质预测的准确性。具体来说,MvMRL 由四个部分组成:一个多尺度 CNN-SE(简化分子输入行输入系统)学习组件和一个多尺度图神经网络编码器,分别从 SMILES 视图和分子图视图中提取局部特征信息和全局特征信息;一个多层感知机网络,从分子指纹视图中捕获复杂的非线性关系特征;以及一个双交叉注意组件,用于深入融合多视图上的特征信息,以预测分子性质。我们在 11 个基准数据集上评估了 MvMRL 的性能,实验结果表明 MvMRL 优于最先进的方法,表明其在分子性质预测中的合理性和有效性。MvMRL 的源代码已在 https://github.com/jedison-github/MvMRL 上发布。