Gao Xu, Yan Mengfan, Zhang Chengwei, Wu Gang, Shang Jiandong, Zhang Congxiang, Yang Kecheng
School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China.
National Supercomputing Center in Zhengzhou, Zhengzhou, China.
Front Genet. 2025 Mar 20;16:1527300. doi: 10.3389/fgene.2025.1527300. eCollection 2025.
Determining drug-target affinity (DTA) is a pivotal step in drug discovery, where methods can significantly improve efficiency and reduce costs. Artificial intelligence (AI), especially deep learning models, can automatically extract high-dimensional features from the biological sequences of drug molecules and target proteins. This technology demonstrates lower complexity in DTA prediction compared to traditional experimental methods, particularly when handling large-scale data. In this study, we introduce a multimodal deep neural network model for DTA prediction, referred to as MDNN-DTA. This model employs Graph Convolutional Networks (GCN) and Convolutional Neural Networks (CNN) to extract features from the drug and protein sequences, respectively. One notable strength of our method is its ability to accurately predict DTA directly from the sequences of the target proteins, obviating the need for protein 3D structures, which are frequently unavailable in drug discovery. To comprehensively extract features from the protein sequence, we leverage an ESM pre-trained model for extracting biochemical features and design a specific Protein Feature Extraction (PFE) block for capturing both global and local features of the protein sequence. Furthermore, a Protein Feature Fusion (PFF) Block is engineered to augment the integration of multi-scale protein features derived from the abovementioned techniques. We then compare MDNN-DTA with other models on the same dataset, conducting a series of ablation experiments to assess the performance and efficacy of each component. The results highlight the advantages and effectiveness of the MDNN-DTA method.
确定药物-靶点亲和力(DTA)是药物研发中的关键步骤,其中方法的改进能够显著提高效率并降低成本。人工智能(AI),特别是深度学习模型,可以从药物分子和靶蛋白的生物序列中自动提取高维特征。与传统实验方法相比,这项技术在DTA预测中表现出更低的复杂性,尤其是在处理大规模数据时。在本研究中,我们引入了一种用于DTA预测的多模态深度神经网络模型,称为MDNN-DTA。该模型分别采用图卷积网络(GCN)和卷积神经网络(CNN)从药物和蛋白质序列中提取特征。我们方法的一个显著优势是能够直接从靶蛋白序列准确预测DTA,无需蛋白质三维结构,而在药物研发中蛋白质三维结构往往难以获得。为了从蛋白质序列中全面提取特征,我们利用一个预训练的ESM模型来提取生化特征,并设计了一个特定的蛋白质特征提取(PFE)模块来捕捉蛋白质序列的全局和局部特征。此外,还设计了一个蛋白质特征融合(PFF)模块,以增强上述技术所衍生的多尺度蛋白质特征的整合。然后,我们在同一数据集上比较MDNN-DTA与其他模型,进行一系列消融实验以评估每个组件的性能和功效。结果突出了MDNN-DTA方法的优势和有效性。