Bu Yingzi, Gao Ruoxi, Zhang Bohan, Zhang Luchen, Sun Duxin
Department of Pharmaceutical Sciences, College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109, United States.
Department of Electrical Engineering and Computer Science, University of MichiganAnn Arbor, Michigan 48109, United States.
ACS Omega. 2023 Mar 27;8(14):13232-13242. doi: 10.1021/acsomega.3c00160. eCollection 2023 Apr 11.
The discovery of new drug candidates to inhibit an intended target is a complex and resource-consuming process. A machine learning (ML) method for predicting drug-target interactions (DTI) is a potential solution to improve the efficiency. However, traditional ML approaches have limitations in accuracy. In this study, we developed a novel ensemble model CoGT for DTI prediction using multilayer perceptron (MLP), which integrated graph-based models to extract non-Euclidean molecular structures and large pretrained models, specifically chemBERTa, to process simplified molecular input line entry systems (SMILES). The performance of CoGT was evaluated using compounds inhibiting four Janus kinases (JAKs). Results showed that the large pretrained model, chemBERTa, was better than other conventional ML models in predicting DTI across multiple evaluation metrics, while the graph neural network (GNN) was effective for prediction on imbalanced data sets. To take full advantage of the strengths of these different models, we developed an ensemble model, CoGT, which outperformed other individual ML models in predicting compounds' inhibition on different isoforms of JAKs. Our data suggest that the ensemble model CoGT has the potential to accelerate the process of drug discovery.
发现用于抑制目标靶点的新药候选物是一个复杂且耗费资源的过程。一种用于预测药物 - 靶点相互作用(DTI)的机器学习(ML)方法是提高效率的潜在解决方案。然而,传统的ML方法在准确性方面存在局限性。在本研究中,我们使用多层感知器(MLP)开发了一种用于DTI预测的新型集成模型CoGT,该模型整合了基于图的模型以提取非欧几里得分子结构,并使用大型预训练模型,特别是chemBERTa,来处理简化分子输入线性表系统(SMILES)。使用抑制四种 Janus 激酶(JAKs)的化合物对CoGT的性能进行了评估。结果表明,在多个评估指标上,大型预训练模型chemBERTa在预测DTI方面优于其他传统ML模型,而图神经网络(GNN)在不平衡数据集的预测上很有效。为了充分利用这些不同模型的优势,我们开发了一个集成模型CoGT,它在预测化合物对不同JAK异构体的抑制作用方面优于其他单个ML模型。我们的数据表明,集成模型CoGT有加速药物发现过程的潜力。