Wei Xirun, Ning Qiao, Che Kuiyang, Liu Zhaowei, Li Hui, Guo Shikai
Department of Information Science and Technology, Dalian Maritime University, Dalian 116026, P.R. China.
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, P.R. China.
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf078.
S-sulfhydration, a crucial post-translational protein modification, is pivotal in cellular recognition, signaling processes, and the development and progression of cardiovascular and neurological disorders, so identifying S-sulfhydration sites is crucial for studies in cell biology. Deep learning shows high efficiency and accuracy in identifying protein sites compared to traditional methods that often lack sensitivity and specificity in accurately locating nonsulfhydration sites. Therefore, we employ deep learning methods to tackle the challenge of pinpointing S-sulfhydration sites.
In this work, we introduce a deep learning approach called Sul-BertGRU, designed specifically for predicting S-sulfhydration sites in proteins, which integrates multi-directional gated recurrent unit (GRU) and BERT. First, Sul-BertGRU proposes an information entropy-enhanced BERT (IE-BERT) to preprocess protein sequences and extract initial features. Subsequently, confidence learning is employed to eliminate potential S-sulfhydration samples from the nonsulfhydration samples and select reliable negative samples. Then, considering the directional nature of the modification process, protein sequences are categorized into left, right, and full sequences centered on cysteines. We build a multi-directional GRU to enhance the extraction of directional sequence features and model the details of the enzymatic reaction involved in S-sulfhydration. Ultimately, we apply a parallel multi-head self-attention mechanism alongside a convolutional neural network to deeply analyze sequence features that might be missed at a local level. Sul-BertGRU achieves sensitivity, specificity, precision, accuracy, Matthews correlation coefficient, and area under the curve scores of 85.82%, 68.24%, 74.80%, 77.44%, 55.13%, and 77.03%, respectively. Sul-BertGRU demonstrates exceptional performance and proves to be a reliable method for predicting protein S-sulfhydration sites.
The source code and data are available at https://github.com/Severus0902/Sul-BertGRU/.
S-巯基化是一种关键的翻译后蛋白质修饰,在细胞识别、信号传导过程以及心血管和神经疾病的发生发展中起着关键作用,因此识别S-巯基化位点对于细胞生物学研究至关重要。与传统方法相比,深度学习在识别蛋白质位点方面显示出更高的效率和准确性,传统方法在准确定位非巯基化位点时往往缺乏敏感性和特异性。因此,我们采用深度学习方法来应对确定S-巯基化位点的挑战。
在这项工作中,我们引入了一种名为Sul-BertGRU的深度学习方法,专门用于预测蛋白质中的S-巯基化位点,该方法整合了多向门控循环单元(GRU)和BERT。首先,Sul-BertGRU提出了一种信息熵增强的BERT(IE-BERT)来预处理蛋白质序列并提取初始特征。随后,采用置信度学习从非巯基化样本中消除潜在的S-巯基化样本,并选择可靠的阴性样本。然后,考虑到修饰过程的方向性,将蛋白质序列分类为以半胱氨酸为中心的左、右和全序列。我们构建了一个多向GRU来增强对方向性序列特征的提取,并对S-巯基化中涉及的酶促反应细节进行建模。最终,我们应用并行多头自注意力机制以及卷积神经网络来深入分析可能在局部水平上被遗漏的序列特征。Sul-BertGRU的敏感性、特异性、精确率、准确率、马修斯相关系数和曲线下面积得分分别达到85.82%、68.24%、74.80%、77.44%、55.13%和77.03%。Sul-BertGRU表现出卓越的性能,被证明是一种预测蛋白质S-巯基化位点的可靠方法。