School of Software, Henan University, Kaifeng, Henan Province 475000, China.
Henan International Joint Laboratory of Intelligent Network Theory and Key Technology, Henan University, Kaifeng, Henan Province 475000, China.
Bioinformatics. 2024 Jun 28;40(Suppl 1):i539-i547. doi: 10.1093/bioinformatics/btae240.
In drug discovery, it is crucial to assess the drug-target binding affinity (DTA). Although molecular docking is widely used, computational efficiency limits its application in large-scale virtual screening. Deep learning-based methods learn virtual scoring functions from labeled datasets and can quickly predict affinity. However, there are three limitations. First, existing methods only consider the atom-bond graph or one-dimensional sequence representations of compounds, ignoring the information about functional groups (pharmacophores) with specific biological activities. Second, relying on limited labeled datasets fails to learn comprehensive embedding representations of compounds and proteins, resulting in poor generalization performance in complex scenarios. Third, existing feature fusion methods cannot adequately capture contextual interaction information.
Therefore, we propose a novel DTA prediction method named HeteroDTA. Specifically, a multi-view compound feature extraction module is constructed to model the atom-bond graph and pharmacophore graph. The residue concat graph and protein sequence are also utilized to model protein structure and function. Moreover, to enhance the generalization capability and reduce the dependence on task-specific labeled data, pre-trained models are utilized to initialize the atomic features of the compounds and the embedding representations of the protein sequence. A context-aware nonlinear feature fusion method is also proposed to learn interaction patterns between compounds and proteins. Experimental results on public benchmark datasets show that HeteroDTA significantly outperforms existing methods. In addition, HeteroDTA shows excellent generalization performance in cold-start experiments and superiority in the representation learning ability of drug-target pairs. Finally, the effectiveness of HeteroDTA is demonstrated in a real-world drug discovery study.
The source code and data are available at https://github.com/daydayupzzl/HeteroDTA.
在药物发现中,评估药物-靶标结合亲和力(DTA)至关重要。尽管分子对接被广泛应用,但计算效率限制了其在大规模虚拟筛选中的应用。基于深度学习的方法从标记数据集学习虚拟评分函数,并可以快速预测亲和力。然而,存在三个限制。首先,现有方法仅考虑化合物的原子-键图或一维序列表示,忽略了具有特定生物活性的功能基团(药效团)的信息。其次,依赖有限的标记数据集无法学习到化合物和蛋白质的全面嵌入表示,导致在复杂场景下的泛化性能较差。第三,现有的特征融合方法无法充分捕获上下文交互信息。
因此,我们提出了一种名为 HeteroDTA 的新型 DTA 预测方法。具体来说,构建了一个多视图化合物特征提取模块,用于对原子-键图和药效团图进行建模。还利用残基连接图和蛋白质序列对蛋白质结构和功能进行建模。此外,为了增强泛化能力并减少对特定于任务的标记数据的依赖,利用预训练模型初始化化合物的原子特征和蛋白质序列的嵌入表示。还提出了一种上下文感知的非线性特征融合方法,以学习化合物和蛋白质之间的相互作用模式。在公共基准数据集上的实验结果表明,HeteroDTA 显著优于现有方法。此外,HeteroDTA 在冷启动实验中表现出出色的泛化性能,并且在药物-靶标对的表示学习能力方面具有优势。最后,在实际的药物发现研究中证明了 HeteroDTA 的有效性。
源代码和数据可在 https://github.com/daydayupzzl/HeteroDTA 上获得。