Ji Zewei, Shi Runhan, Lu Jiarui, Li Fang, Yang Yang
Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai200240, China.
J Chem Inf Model. 2022 Nov 28;62(22):5361-5372. doi: 10.1021/acs.jcim.2c00798. Epub 2022 Oct 27.
Molecular representation is a critical part of various prediction tasks for physicochemical properties of molecules and drug design. As graph notations are common in expressing the structural information of chemical compounds, graph neural networks (GNNs) have become the mainstream backbone model for learning molecular representation. However, the scarcity of task-specific labels in the biomedical domain limits the power of GNNs. Recently, self-supervised pretraining for GNNs has been leveraged to deal with this issue, while the existing pretraining methods are mainly designed for graph data in general domains without considering the specific data properties of molecules. In this paper, we propose a representation learning method for molecular graphs, called ReLMole, which is featured by a hierarchical graph modeling of molecules and a contrastive learning scheme based on two-level graph similarities. We assess the performance of ReLMole on two types of downstream tasks, namely, the prediction of molecular properties (MPs) and drug-drug interaction (DDIs). ReLMole achieves promising results for all the tasks. It outperforms the baseline models by over 2.6% on ROC-AUC averaged across six MP prediction tasks, and it improves the F1 value by 7-18% in DDI prediction for unseen drugs compared with other self-supervised models.
分子表示是分子物理化学性质和药物设计等各种预测任务的关键部分。由于图形符号在表达化合物结构信息方面很常见,图神经网络(GNN)已成为学习分子表示的主流骨干模型。然而,生物医学领域中特定任务标签的稀缺限制了GNN的能力。最近,GNN的自监督预训练已被用于解决这个问题,而现有的预训练方法主要是为通用领域的图形数据设计的,没有考虑分子的特定数据属性。在本文中,我们提出了一种用于分子图的表示学习方法,称为ReLMole,其特点是对分子进行分层图建模,并基于两级图相似性的对比学习方案。我们评估了ReLMole在两种类型的下游任务上的性能,即分子性质(MP)预测和药物-药物相互作用(DDI)。ReLMole在所有任务上都取得了有希望的结果。在六个MP预测任务的平均ROC-AUC上,它比基线模型高出2.6%以上,并且与其他自监督模型相比,在未见过的药物的DDI预测中,它将F1值提高了7-18%。