College of Science, Dalian Maritime University, Dalian 116026, PR China.
College of Science, Dalian Maritime University, Dalian 116026, PR China.
J Theor Biol. 2019 Jan 14;461:51-58. doi: 10.1016/j.jtbi.2018.10.046. Epub 2018 Oct 23.
Protein S-sulfenylation is an essential post-translational modification (PTM) that provides critical information to understand molecular mechanisms of cell signaling transduction, stress response and regulation of cellular functions. Recent advancements in computational methods have contributed towards the detection of protein S-sulfenylation sites. However, the performance of identifying protein S-sulfenylation sites can be influenced by a class imbalance of training datasets while the application of various computational methods. In this study, we designed a Fu-SulfPred model using stratified structure of three kinds of decision trees in order to identify possible protein S-sulfenylation sites by means of reconstructing training datasets and sample rescaling technology. Experimental results showed that the correlation coefficient values of Fu-SulfPred model were found to be 0.5437, 0.3736 and 0.6809 on three independent test datasets, respectively, all of which outperformed the Matthews coefficient values of S-SulfPred model. Fu-SulfPred model provides a promising scheme for the identification of protein S-sulfenylation sites and other post-translational modifications.
蛋白质 S-亚磺化是一种重要的翻译后修饰(PTM),为理解细胞信号转导、应激反应和细胞功能调节的分子机制提供了关键信息。计算方法的最新进展有助于检测蛋白质 S-亚磺化位点。然而,在应用各种计算方法时,训练数据集的类别不平衡会影响识别蛋白质 S-亚磺化位点的性能。在这项研究中,我们设计了一种 Fu-SulfPred 模型,该模型使用三种决策树的分层结构,通过重建训练数据集和样本缩放技术来识别可能的蛋白质 S-亚磺化位点。实验结果表明,Fu-SulfPred 模型在三个独立的测试数据集上的相关系数值分别为 0.5437、0.3736 和 0.6809,均优于 S-SulfPred 模型的马修斯系数值。Fu-SulfPred 模型为识别蛋白质 S-亚磺化位点和其他翻译后修饰提供了一种很有前途的方案。