Suppr超能文献

DeepCDA:通过 LSTM 和卷积神经网络进行深度跨域化合物-蛋白质亲和力预测。

DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks.

机构信息

Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 1417614411, Iran.

Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan 4513766731, Iran.

出版信息

Bioinformatics. 2020 Nov 1;36(17):4633-4642. doi: 10.1093/bioinformatics/btaa544.

Abstract

MOTIVATION

An essential part of drug discovery is the accurate prediction of the binding affinity of new compound-protein pairs. Most of the standard computational methods assume that compounds or proteins of the test data are observed during the training phase. However, in real-world situations, the test and training data are sampled from different domains with different distributions. To cope with this challenge, we propose a deep learning-based approach that consists of three steps. In the first step, the training encoder network learns a novel representation of compounds and proteins. To this end, we combine convolutional layers and long-short-term memory layers so that the occurrence patterns of local substructures through a protein and a compound sequence are learned. Also, to encode the interaction strength of the protein and compound substructures, we propose a two-sided attention mechanism. In the second phase, to deal with the different distributions of the training and test domains, a feature encoder network is learned for the test domain by utilizing an adversarial domain adaptation approach. In the third phase, the learned test encoder network is applied to new compound-protein pairs to predict their binding affinity.

RESULTS

To evaluate the proposed approach, we applied it to KIBA, Davis and BindingDB datasets. The results show that the proposed method learns a more reliable model for the test domain in more challenging situations.

AVAILABILITY AND IMPLEMENTATION

https://github.com/LBBSoft/DeepCDA.

摘要

动机

药物发现的一个重要部分是准确预测新化合物-蛋白质对的结合亲和力。大多数标准的计算方法都假设测试数据中的化合物或蛋白质在训练阶段被观察到。然而,在实际情况中,测试数据和训练数据是从具有不同分布的不同域中采样的。为了应对这一挑战,我们提出了一种基于深度学习的方法,该方法由三个步骤组成。在第一步中,训练编码器网络学习化合物和蛋白质的新表示。为此,我们结合卷积层和长短期记忆层,以便通过蛋白质和化合物序列学习局部子结构的出现模式。此外,为了编码蛋白质和化合物子结构的相互作用强度,我们提出了一种双边注意机制。在第二阶段,为了处理训练域和测试域的不同分布,通过利用对抗性域自适应方法为测试域学习特征编码器网络。在第三阶段,将学习到的测试编码器网络应用于新的化合物-蛋白质对,以预测它们的结合亲和力。

结果

为了评估所提出的方法,我们将其应用于 KIBA、Davis 和 BindingDB 数据集。结果表明,所提出的方法在更具挑战性的情况下为测试域学习了更可靠的模型。

可用性和实现

https://github.com/LBBSoft/DeepCDA。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验