School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China.
Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Longmian Avenue No. 101, Nanjing, 211166 Jiangsu, China.
J Chem Inf Model. 2024 Nov 11;64(21):8322-8338. doi: 10.1021/acs.jcim.4c01061. Epub 2024 Oct 21.
Toxicity is paramount for comprehending compound properties, particularly in the early stages of drug design. Due to the diversity and complexity of toxic effects, it became a challenge to compute compound toxicity tasks. To address this issue, we propose a multimodal representation learning model, termed multimodal graph isomorphism network (MMGIN), to address this challenge for compound toxicity multitask learning. Based on fingerprints and molecular graphs of compounds, our MMGIN model incorporates a multimodal representation learning model to acquire a comprehensive compound representation. This model adopts a two-channel structure to independently learn fingerprint representation and molecular graph representation. Subsequently, two feedforward neural networks utilize the learned multimodal compound representation to perform multitask learning, encompassing compound toxicity classification and multiple compound category classification simultaneously. To test the effectiveness of our model, we constructed a novel data set, termed the compound toxicity multitask learning (CTMTL) data set, derived from the TOXRIC data set. We compare our MMGIN model with other representative machine learning and deep learning models on the CTMTL and Tox21 data sets. The experimental results demonstrate significant advancements achieved by our MMGIN model. Furthermore, the ablation study underscores the effectiveness of the introduced fingerprints, molecular graphs, the multimodal representation learning model, and the multitask learning model, showcasing the model's superior predictive capability and robustness.
毒性对于理解化合物性质至关重要,特别是在药物设计的早期阶段。由于毒性作用的多样性和复杂性,计算化合物毒性任务成为了一个挑战。为了解决这个问题,我们提出了一种多模态表示学习模型,称为多模态图同构网络(MMGIN),以解决化合物毒性多任务学习中的这个挑战。基于化合物的指纹和分子图,我们的 MMGIN 模型采用了一种多模态表示学习模型,以获取全面的化合物表示。该模型采用了双通道结构,分别独立地学习指纹表示和分子图表示。然后,两个前馈神经网络利用学习到的多模态化合物表示来同时进行多任务学习,包括化合物毒性分类和多个化合物类别分类。为了测试我们模型的有效性,我们构建了一个新的数据集,称为化合物毒性多任务学习(CTMTL)数据集,该数据集源自 TOXRIC 数据集。我们将我们的 MMGIN 模型与其他有代表性的机器学习和深度学习模型在 CTMTL 和 Tox21 数据集上进行了比较。实验结果表明我们的 MMGIN 模型取得了显著的进展。此外,消融研究强调了引入的指纹、分子图、多模态表示学习模型和多任务学习模型的有效性,展示了模型的优越预测能力和鲁棒性。