Dai Yang, Tan Xiaoyu, Wang Haoyu, Ma Gengchen, Xiong Yujie, Qiu Xihe
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China.
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China; Tencent Youtu Lab, Shanghai, Shanghai, China.
Comput Methods Programs Biomed. 2026 Feb 1;274:109163. doi: 10.1016/j.cmpb.2025.109163. Epub 2025 Nov 21.
Drug-target affinity (DTA) prediction is a pivotal task in computational drug discovery, enabling the estimation of binding affinities between small molecules and their target proteins. This process is essential for reducing the costs, development time, and risks inherent in traditional drug development pipelines. Current DTA prediction models primarily rely on separate extraction and concatenation of drug and protein features. However, these models often fail to account for the complex semantic relationships within protein sequences, which limits their ability to accurately predict affinity.
In response to these challenges, we propose MDM-DTA, a novel framework leveraging a Mixture of Experts (MoE) strategy to integrate diverse molecular and protein representations. For drug representation, MDM-DTA utilizes molecular graphs, which are processed via Message Passing Neural Networks (MPNNs), alongside molecular descriptors that are passed through a three-layer convolutional neural network (CNN). Protein features are extracted using a deep convolutional network enhanced with Squeeze-and-Excitation (SE) mechanisms to capture inter-channel dependencies. Furthermore, protein sequence semantics are encoded through pre-trained embeddings from a knowledge-guided Bidirectional Encoder Representations from Transformers (BERT) model and the Evolutionary Scale Modeling 2 (ESM2) model, enabling the model to capture contextual relationships within protein sequences.
Extensive experiments on three benchmark datasets demonstrate that MDM-DTA consistently outperforms state-of-the-art models of similar complexity in terms of predictive accuracy. The incorporation of both structural and semantic features significantly enhances the model's ability to predict drug-target binding affinities, highlighting the importance of a multi-modal representation approach.
The proposed MDM-DTA framework effectively integrates both molecular and semantic protein representations, providing superior performance in DTA prediction tasks. The results underscore the potential of MDM-DTA to improve the accuracy of computational drug discovery models, facilitating the identification of novel drug candidates and advancing the field of in silico drug development.
药物-靶点亲和力(DTA)预测是计算药物发现中的一项关键任务,能够估算小分子与其靶蛋白之间的结合亲和力。这一过程对于降低传统药物开发流程中固有的成本、开发时间和风险至关重要。当前的DTA预测模型主要依赖于对药物和蛋白质特征的单独提取与拼接。然而,这些模型往往未能考虑蛋白质序列中复杂的语义关系,这限制了它们准确预测亲和力的能力。
为应对这些挑战,我们提出了MDM-DTA,这是一个利用专家混合(MoE)策略来整合多种分子和蛋白质表示的新颖框架。对于药物表示,MDM-DTA利用分子图,通过消息传递神经网络(MPNNs)进行处理,同时分子描述符通过三层卷积神经网络(CNN)传递。蛋白质特征使用增强了挤压激励(SE)机制的深度卷积网络进行提取,以捕获通道间的依赖性。此外,蛋白质序列语义通过来自知识引导的变换器双向编码器表示(BERT)模型和进化尺度建模2(ESM2)模型的预训练嵌入进行编码,使模型能够捕获蛋白质序列中的上下文关系。
在三个基准数据集上进行的广泛实验表明,MDM-DTA在预测准确性方面始终优于类似复杂度的现有模型。结构和语义特征的结合显著增强了模型预测药物-靶点结合亲和力的能力,突出了多模态表示方法的重要性。
所提出的MDM-DTA框架有效地整合了分子和语义蛋白质表示,在DTA预测任务中提供了卓越的性能。结果强调了MDM-DTA在提高计算药物发现模型准确性方面的潜力,有助于识别新型药物候选物并推动计算机辅助药物开发领域的发展。