Key Lab of Molecular Biophysics of Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae319.
Predicting protein-ligand binding affinity is crucial in new drug discovery and development. However, most existing models rely on acquiring 3D structures of elusive proteins. Combining amino acid sequences with ligand sequences and better highlighting active sites are also significant challenges.
We propose an innovative neural network model called DEAttentionDTA, based on dynamic word embeddings and a self-attention mechanism, for predicting protein-ligand binding affinity. DEAttentionDTA takes the 1D sequence information of proteins as input, including the global sequence features of amino acids, local features of the active pocket site, and linear representation information of the ligand molecule in the SMILE format. These three linear sequences are fed into a dynamic word-embedding layer based on a 1D convolutional neural network for embedding encoding and are correlated through a self-attention mechanism. The output affinity prediction values are generated using a linear layer. We compared DEAttentionDTA with various mainstream tools and achieved significantly superior results on the same dataset. We then assessed the performance of this model in the p38 protein family.
The resource codes are available at https://github.com/whatamazing1/DEAttentionDTA.
预测蛋白质-配体结合亲和力在新药发现和开发中至关重要。然而,大多数现有的模型都依赖于获取难以捉摸的蛋白质的 3D 结构。结合氨基酸序列和配体序列,并更好地突出活性位点也是重大挑战。
我们提出了一种名为 DEAttentionDTA 的创新神经网络模型,用于预测蛋白质-配体结合亲和力,该模型基于动态词嵌入和自注意力机制。DEAttentionDTA 将蛋白质的 1D 序列信息作为输入,包括氨基酸的全局序列特征、活性口袋位点的局部特征以及 SMILE 格式的配体分子的线性表示信息。这三个线性序列被输入到基于 1D 卷积神经网络的动态词嵌入层中进行嵌入编码,并通过自注意力机制进行关联。使用线性层生成亲和力预测值。我们将 DEAttentionDTA 与各种主流工具进行了比较,并在相同的数据集上取得了显著优越的结果。然后,我们评估了该模型在 p38 蛋白家族中的性能。
资源代码可在 https://github.com/whatamazing1/DEAttentionDTA 上获得。