Hasan Md Mehedi, Guo Dianjing, Kurata Hiroyuki
Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
Mol Biosyst. 2017 Nov 21;13(12):2545-2550. doi: 10.1039/c7mb00491e.
Cysteine S-sulfenylation is a major type of posttranslational modification that contributes to protein structure and function regulation in many cellular processes. Experimental identification of S-sulfenylation sites is challenging, due to the low abundance of proteins and the inefficient experimental methods. Computational identification of S-sulfenylation sites is an alternative strategy to annotate the S-sulfenylated proteome. In this study, a novel computational predictor SulCysSite was developed for accurate prediction of S-sulfenylation sites based on multiple sequence features, including amino acid index properties, binary amino acid codes, position specific scoring matrix, and compositions of profile-based amino acids. To learn the prediction model of SulCysSite, a random forest classifier was applied. The final SulCysSite achieved an AUC value of 0.819 in a 10-fold cross-validation test. It also exhibited higher performance than other existing computational predictors. In addition, the hidden and complex mechanisms were extracted from the predictive model of SulCysSite to investigate the understandable rules (i.e. feature combination) of S-sulfenylation sites. The SulCysSite is a useful computational resource for prediction of S-sulfenylation sites. The online interface and datasets are publicly available at .
半胱氨酸S-亚磺酰化是一种主要的翻译后修饰类型,在许多细胞过程中有助于蛋白质结构和功能的调节。由于蛋白质丰度低以及实验方法效率低下,S-亚磺酰化位点的实验鉴定具有挑战性。S-亚磺酰化位点的计算鉴定是注释S-亚磺酰化蛋白质组的一种替代策略。在本研究中,基于多种序列特征,包括氨基酸指数特性、二元氨基酸编码、位置特异性评分矩阵和基于轮廓的氨基酸组成,开发了一种新型计算预测器SulCysSite,用于准确预测S-亚磺酰化位点。为了学习SulCysSite的预测模型,应用了随机森林分类器。最终的SulCysSite在10倍交叉验证测试中获得了0.819的AUC值。它还表现出比其他现有计算预测器更高的性能。此外,从SulCysSite的预测模型中提取了隐藏和复杂的机制,以研究S-亚磺酰化位点的可理解规则(即特征组合)。SulCysSite是预测S-亚磺酰化位点的有用计算资源。在线界面和数据集可在 上公开获取。