School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China.
Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, 130117, China.
BMC Bioinformatics. 2018 Jun 25;19(1):237. doi: 10.1186/s12859-018-2249-4.
Lysine succinylation is a new kind of post-translational modification which plays a key role in protein conformation regulation and cellular function control. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. However, traditional methods, experimental approaches, are labor-intensive and time-consuming. Computational prediction methods have been proposed recent years, and they are popular because of their convenience and high speed. In this study, we developed a new method to predict succinylation sites in protein combining multiple features, including amino acid composition, binary encoding, physicochemical property and grey pseudo amino acid composition, with a feature selection scheme (information gain). And then, it was trained using SVM (Support Vector Machine) and an ensemble learning algorithm.
The performance of this method was measured with an accuracy of 89.14% and a MCC (Matthew Correlation Coefficient) of 0.79 using 10-fold cross validation on training dataset and an accuracy of 84.5% and a MCC of 0.2 on independent dataset.
The conclusions made from this study can help to understand more of the succinylation mechanism. These results suggest that our method was very promising for predicting succinylation sites. The source code and data of this paper are freely available at https://github.com/ningq669/PSuccE .
赖氨酸琥珀酰化是一种新的翻译后修饰,在蛋白质构象调节和细胞功能控制中起着关键作用。为了深入了解琥珀酰化的机制,有必要准确识别蛋白质中的琥珀酰化位点。然而,传统的实验方法是劳动密集型和耗时的。近年来已经提出了计算预测方法,由于其方便和快速而受到欢迎。在这项研究中,我们结合多种特征(包括氨基酸组成、二进制编码、理化性质和灰色伪氨基酸组成),并使用特征选择方案(信息增益),开发了一种新的预测蛋白质琥珀酰化位点的方法。然后,它使用 SVM(支持向量机)和集成学习算法进行训练。
在训练数据集上使用 10 折交叉验证,该方法的性能测量得到了准确性为 89.14%和 MCC(马修相关系数)为 0.79,在独立数据集上的准确性为 84.5%和 MCC 为 0.2。
本研究的结论有助于更好地了解琥珀酰化机制。这些结果表明,我们的方法非常有希望用于预测琥珀酰化位点。本文的源代码和数据可在 https://github.com/ningq669/PSuccE 上免费获取。