Zhao Xiaowei, Li Xiangtao, Ma Zhiqiang, Yin Minghao
College of Life Science, Northeast Normal University, 5268 Renmin Street, Changchun 130024, China; College of Computer Science, Northeast Normal University, 2555 Jingyue Street, Changchun 13017, China.
Int J Mol Sci. 2011;12(12):8347-61. doi: 10.3390/ijms12128347. Epub 2011 Nov 28.
Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.
泛素化是一种重要的翻译后修饰过程。正确识别蛋白质赖氨酸泛素化位点对于理解生物系统中赖氨酸泛素化的分子机制至关重要。本文基于集成方法开发了一种新颖的计算方法来有效识别赖氨酸泛素化位点。在所提出的方法中,从瑞士蛋白质数据库检索到的323个蛋白质中的468个泛素化位点通过使用四种蛋白质序列信息被编码为特征向量。然后应用一种有效的特征选择方法来提取信息丰富的特征子集。在搜索过程中通过设置不同的起始点获得不同的特征子集后,将它们用于训练多个随机森林分类器,然后通过多数投票聚合为一个共识分类器。分别通过留一法测试和独立测试评估,所提出的预测器对于训练数据集的准确率达到76.82%,对于测试数据集的准确率达到79.16%,表明该预测器是预测赖氨酸泛素化位点的有用工具。此外,进行了位点特异性特征分析,结果表明除了来自赖氨酸位点本身的特征外,泛素化与其周围位点的特征密切相关。特征选择方法可应要求提供。