Yang Zhihui, Liu Juan, Zhu Xuekai, Yang Feng, Zhang Qiang, Shah Hayat Ali
Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, 430072 China.
Front Comput Sci (Berl). 2023;17(5):175903. doi: 10.1007/s11704-022-2163-9. Epub 2022 Dec 13.
Prediction of drug-protein binding is critical for virtual drug screening. Many deep learning methods have been proposed to predict the drug-protein binding based on protein sequences and drug representation sequences. However, most existing methods extract features from protein and drug sequences separately. As a result, they can not learn the features characterizing the drug-protein interactions. In addition, the existing methods encode the protein (drug) sequence usually based on the assumption that each amino acid (atom) has the same contribution to the binding, ignoring different impacts of different amino acids (atoms) on the binding. However, the event of drug-protein binding usually occurs between conserved residue fragments in the protein sequence and atom fragments of the drug molecule. Therefore, a more comprehensive encoding strategy is required to extract information from the conserved fragments. In this paper, we propose a novel model, named FragDPI, to predict the drug-protein binding affinity. Unlike other methods, we encode the sequences based on the conserved fragments and encode the protein and drug into a unified vector. Moreover, we adopt a novel two-step training strategy to train FragDPI. The pre-training step is to learn the interactions between different fragments using unsupervised learning. The fine-tuning step is for predicting the binding affinities using supervised learning. The experiment results have illustrated the superiority of FragDPI.
Supplementary material is available for this article at 10.1007/s11704-022-2163-9 and is accessible for authorized users.
药物与蛋白质结合的预测对于虚拟药物筛选至关重要。已经提出了许多深度学习方法来基于蛋白质序列和药物表示序列预测药物与蛋白质的结合。然而,大多数现有方法分别从蛋白质和药物序列中提取特征。因此,它们无法学习表征药物与蛋白质相互作用的特征。此外,现有方法通常基于每个氨基酸(原子)对结合具有相同贡献的假设来编码蛋白质(药物)序列,而忽略了不同氨基酸(原子)对结合的不同影响。然而,药物与蛋白质结合的事件通常发生在蛋白质序列中的保守残基片段与药物分子的原子片段之间。因此,需要一种更全面的编码策略来从保守片段中提取信息。在本文中,我们提出了一种名为FragDPI的新型模型来预测药物与蛋白质的结合亲和力。与其他方法不同,我们基于保守片段对序列进行编码,并将蛋白质和药物编码为一个统一的向量。此外,我们采用一种新颖的两步训练策略来训练FragDPI。预训练步骤是使用无监督学习来学习不同片段之间的相互作用。微调步骤是使用监督学习来预测结合亲和力。实验结果说明了FragDPI的优越性。
本文的补充材料可在10.1007/s11704-022-2163-9获取,授权用户可访问。