Department of Biomedical Engineering at Tsinghua University, China.
Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab109.
How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.
如何生成富有表现力的分子表示是人工智能驱动的药物发现中的一个基本挑战。图神经网络(GNN)已成为建模分子数据的强大技术。然而,以前的监督方法通常受到标记数据的稀缺性和较差的泛化能力的影响。在这里,我们提出了一种新颖的基于图的分子预训练深度学习框架,名为 MPG,它可以从大规模未标记的分子中学习分子表示。在 MPG 中,我们提出了一种用于建模分子图的强大 GNN,名为 MolGNet,并设计了一种有效的自监督策略,用于在节点和图级别上对模型进行预训练。在对 1100 万个未标记的分子进行预训练后,我们发现 MolGNet 可以捕获有价值的化学见解,以生成可解释的表示。经过微调,只需再添加一个输出层,就可以在 14 个基准数据集上为广泛的药物发现任务(包括分子性质预测、药物-药物相互作用和药物-靶标相互作用)创建最先进的模型。MPG 中的预训练 MolGNet 有可能成为药物发现管道中的高级分子编码器。