Wu Tianyu, Tang Yang, Sun Qiyu, Xiong Luolin
IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):3044-3055. doi: 10.1109/TCBB.2023.3253862. Epub 2023 Oct 9.
In recent years, artificial intelligence has played an important role on accelerating the whole process of drug discovery. Various of molecular representation schemes of different modals (e.g., textual sequence or graph) are developed. By digitally encoding them, different chemical information can be learned through corresponding network structures. Molecular graphs and Simplified Molecular Input Line Entry System (SMILES) are popular means for molecular representation learning in current. Previous works have done attempts by combining both of them to solve the problem of specific information loss in single-modal representation on various tasks. To further fusing such multi-modal imformation, the correspondence between learned chemical feature from different representation should be considered. To realize this, we propose a novel framework of molecular joint representation learning via Multi-Modal information of SMILES and molecular Graphs, called MMSG. We improve the self-attention mechanism by introducing bond-level graph representation as attention bias in Transformer to reinforce feature correspondence between multi-modal information. We further propose a Bidirectional Message Communication Graph Neural Network (BMC GNN) to strengthen the information flow aggregated from graphs for further combination. Numerous experiments on public property prediction datasets have demonstrated the effectiveness of our model.
近年来,人工智能在加速药物研发的全过程中发挥了重要作用。人们开发了各种不同模态(如文本序列或图形)的分子表示方案。通过对它们进行数字编码,可以通过相应的网络结构学习不同的化学信息。分子图和简化分子输入线性输入系统(SMILES)是当前分子表示学习的常用方法。以往的工作尝试将两者结合起来,以解决各种任务中单一模态表示的特定信息丢失问题。为了进一步融合这种多模态信息,应考虑不同表示方式所学习到的化学特征之间的对应关系。为了实现这一点,我们提出了一种通过SMILES和分子图的多模态信息进行分子联合表示学习的新框架,称为MMSG。我们通过在Transformer中引入键级图表示作为注意力偏差来改进自注意力机制,以加强多模态信息之间的特征对应。我们还进一步提出了一种双向消息通信图神经网络(BMC GNN),以加强从图中聚合的信息流,以便进一步组合。在公共性质预测数据集上进行的大量实验证明了我们模型的有效性。