Ouyang Xike, Feng Yannuo, Cui Chen, Li Yunhe, Zhang Li, Wang Han
School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, Jilin 130117, China.
School of Computer Science and Engineering, Changchun University of Technology, Changchun, Jilin 130051, China.
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf002.
Most drugs start on their journey inside the body by binding the right target proteins. This is the reason that numerous efforts have been devoted to predicting the drug-target binding during drug development. However, the inherent diversity among molecular properties, coupled with limited training data availability, poses challenges to the accuracy and generalizability of these methods beyond their training domain.
In this work, we proposed a neural networks construction for high accurate and generalizable drug-target binding prediction, named Pre-trained Multi-view Molecular Representations (PMMR). The method uses pre-trained models to transfer representations of target proteins and drugs to the domain of drug-target binding prediction, mitigating the issue of poor generalizability stemming from limited data. Then, two typical representations of drug molecules, Graphs and SMILES strings, are learned respectively by a Graph Neural Network and a Transformer to achieve complementarity between local and global features. PMMR was evaluated on drug-target affinity and interaction benchmark datasets, and it derived preponderant performance contrast to peer methods, especially generalizability in cold-start scenarios. Furthermore, our state-of-the-art method was indicated to have the potential for drug discovery by a case study of cyclin-dependent kinase 2.
大多数药物在体内的旅程始于与正确的靶蛋白结合。这就是在药物开发过程中人们致力于预测药物-靶标结合的原因。然而,分子特性的内在多样性,加上训练数据有限,给这些方法在其训练领域之外的准确性和泛化性带来了挑战。
在这项工作中,我们提出了一种用于高精度和泛化性药物-靶标结合预测的神经网络构建方法,称为预训练多视图分子表示(PMMR)。该方法使用预训练模型将靶蛋白和药物的表示转移到药物-靶标结合预测领域,缓解了因数据有限而导致的泛化性差的问题。然后,分别通过图神经网络和Transformer学习药物分子的两种典型表示,即图和SMILES字符串,以实现局部和全局特征之间的互补。PMMR在药物-靶标亲和力和相互作用基准数据集上进行了评估,与同类方法相比,它表现出了卓越的性能,尤其是在冷启动场景中的泛化性。此外,通过细胞周期蛋白依赖性激酶2的案例研究表明,我们的先进方法具有药物发现的潜力。