IEEE J Biomed Health Inform. 2023 Apr;27(4):2128-2137. doi: 10.1109/JBHI.2023.3240305.
Predicting drug-target affinity (DTA) is a crucial step in the process of drug discovery. Efficient and accurate prediction of DTA would greatly reduce the time and economic cost of new drug development, which has encouraged the emergence of a large number of deep learning-based DTA prediction methods. In terms of the representation of target proteins, current methods can be classified into 1D sequence- and 2D-protein graph-based methods. However, both two approaches focused only on the inherent properties of the target protein, but neglected the broad prior knowledge regarding protein interactions that have been clearly elucidated in past decades. Aiming at the above issue, this work presents an end-to-end DTA prediction method named MSF-DTA (Multi-Source Feature Fusion-based Drug-Target Affinity). The contributions can be summarized as follows. First, MSF-DTA adopts a novel "neighboring feature"-based protein representation. Instead of utilizing only the inherent features of a target protein, MSF-DTA gathers additional information for the target protein from its biologically related "neighboring" proteins in PPI (i.e., protein-protein interaction) and SSN (i.e., sequence similarity) networks to get prior knowledge. Second, the representation was learned using an advanced graph pre-training framework, VGAE, which could not only gather node features but also learn topological connections, therefore contributing to a richer protein representation and benefiting the downstream DTA prediction task. This study provides new perspective for the DTA prediction task, and evaluation results demonstrated that MSF-DTA obtained superior performances compared to current state-of-the-art methods.
预测药物-靶标亲和力(DTA)是药物发现过程中的关键步骤。高效准确的 DTA 预测将大大降低新药开发的时间和经济成本,这促使了大量基于深度学习的 DTA 预测方法的出现。就靶蛋白的表示而言,目前的方法可分为基于 1D 序列和 2D-蛋白质图的方法。然而,这两种方法都只关注靶蛋白的固有特性,而忽略了过去几十年已经明确阐明的关于蛋白质相互作用的广泛先验知识。针对上述问题,本工作提出了一种端到端的 DTA 预测方法,名为 MSF-DTA(基于多源特征融合的药物-靶标亲和力)。该方法的贡献可以总结如下。首先,MSF-DTA 采用了一种新颖的基于“邻近特征”的蛋白质表示方法。MSF-DTA 不仅利用靶蛋白的固有特征,还从其在 PPI(即蛋白质-蛋白质相互作用)和 SSN(即序列相似性)网络中的生物相关“邻近”蛋白质中收集靶蛋白的附加信息,以获取先验知识。其次,该表示是使用先进的图预训练框架 VGAE 学习得到的,它不仅可以收集节点特征,还可以学习拓扑连接,从而为蛋白质表示提供更丰富的信息,并有助于下游的 DTA 预测任务。本研究为 DTA 预测任务提供了新的视角,评估结果表明,MSF-DTA 与当前最先进的方法相比,表现更为优异。