BiGRUD-SA：基于 BiGRU 和自注意力的蛋白质 S-亚磺化位点预测。

BiGRUD-SA: Protein S-sulfenylation sites prediction based on BiGRU and self-attention.

机构信息

College of Computer Science and Technology, Shandong University, Qingdao, 266237, China; College of Information Science and Technology, School of Data Science, Qingdao University of Science and Technology, Qingdao, 266061, China.

College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.

出版信息

Comput Biol Med. 2023 Sep;163:107145. doi: 10.1016/j.compbiomed.2023.107145. Epub 2023 Jun 8.

DOI:10.1016/j.compbiomed.2023.107145

PMID:37336062

Abstract

S-sulfenylation is a vital post-translational modification (PTM) of proteins, which is an intermediate in other redox reactions and has implications for signal transduction and protein function regulation. However, there are many restrictions on the experimental identification of S-sulfenylation sites. Therefore, predicting S-sulfoylation sites by computational methods is fundamental to studying protein function and related biological mechanisms. In this paper, we propose a method named BiGRUD-SA based on bi-directional gated recurrent unit (BiGRU) and self-attention mechanism to predict protein S-sulfenylation sites. We first use AAC, BLOSUM62, AAindex, EAAC and GAAC to extract features, and do feature fusion to obtain original feature space. Next, we use SMOTE-Tomek method to handle data imbalance. Then, we input the processed data to the BiGRU and use self-attention mechanism to do further feature extraction. Finally, we input the data obtained to the deep neural networks (DNN) to identify S-sulfenylation sites. The accuracies of training set and independent test set are 96.66% and 95.91% respectively, which indicates that our method is conducive to identifying S-sulfenylation sites. Furthermore, we use a data set of S-sulfenylation sites in Arabidopsis thaliana to effectively verify the generalization ability of BiGRUD-SA method, and obtain better prediction results.

摘要

S-亚磺酰化是蛋白质的一种重要的翻译后修饰（PTM），是其他氧化还原反应的中间产物，对信号转导和蛋白质功能调节有影响。然而，实验鉴定 S-亚磺酰化位点存在许多限制。因此，通过计算方法预测 S-亚磺酰化位点对于研究蛋白质功能和相关的生物学机制至关重要。在本文中，我们提出了一种名为 BiGRUD-SA 的方法，该方法基于双向门控循环单元（BiGRU）和自注意力机制来预测蛋白质 S-亚磺酰化位点。我们首先使用 AAC、BLOSUM62、AAindex、EAAC 和 GAAC 提取特征，并进行特征融合以获得原始特征空间。接下来，我们使用 SMOTE-Tomek 方法处理数据不平衡。然后，我们将处理后的数据输入 BiGRU，并使用自注意力机制进一步提取特征。最后，我们将获得的数据输入到深度神经网络（DNN）中以识别 S-亚磺酰化位点。训练集和独立测试集的准确率分别为 96.66%和 95.91%，表明我们的方法有利于识别 S-亚磺酰化位点。此外，我们使用拟南芥 S-亚磺酰化位点数据集有效地验证了 BiGRUD-SA 方法的泛化能力，并获得了更好的预测结果。