Zhou You, Zhang Ning, Li Bi-Qing, Huang Tao, Cai Yu-Dong, Kong Xiang-Yin
a The Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine , Shanghai 200031 , P.R. China.
b Department of Biomedical Engineering, Tianjin Key Lab of BME Measurement , Tianjin University , Tianjin 300072 , P.R. China.
J Biomol Struct Dyn. 2015;33(11):2479-90. doi: 10.1080/07391102.2014.1001793. Epub 2015 Jan 23.
Lysine acetylation and ubiquitination are two primary post-translational modifications (PTMs) in most eukaryotic proteins. Lysine residues are targets for both types of PTMs, resulting in different cellular roles. With the increasing availability of protein sequences and PTM data, it is challenging to distinguish the two types of PTMs on lysine residues. Experimental approaches are often laborious and time consuming. There is an urgent need for computational tools to distinguish between lysine acetylation and ubiquitination. In this study, we developed a novel method, called DAUFSA (distinguish between lysine acetylation and lysine ubiquitination with feature selection and analysis), to discriminate ubiquitinated and acetylated lysine residues. The method incorporated several types of features: PSSM (position-specific scoring matrix) conservation scores, amino acid factors, secondary structures, solvent accessibilities, and disorder scores. By using the mRMR (maximum relevance minimum redundancy) method and the IFS (incremental feature selection) method, an optimal feature set containing 290 features was selected from all incorporated features. A dagging-based classifier constructed by the optimal features achieved a classification accuracy of 69.53%, with an MCC of .3853. An optimal feature set analysis showed that the PSSM conservation score features and the amino acid factor features were the most important attributes, suggesting differences between acetylation and ubiquitination. Our study results also supported previous findings that different motifs were employed by acetylation and ubiquitination. The feature differences between the two modifications revealed in this study are worthy of experimental validation and further investigation.
赖氨酸乙酰化和泛素化是大多数真核生物蛋白质中两种主要的翻译后修饰(PTM)。赖氨酸残基是这两种类型PTM的作用靶点,导致了不同的细胞功能。随着蛋白质序列和PTM数据的日益丰富,区分赖氨酸残基上的这两种类型的PTM具有挑战性。实验方法通常既费力又耗时。迫切需要计算工具来区分赖氨酸乙酰化和泛素化。在本研究中,我们开发了一种名为DAUFSA(通过特征选择和分析区分赖氨酸乙酰化和赖氨酸泛素化)的新方法,以区分泛素化和乙酰化的赖氨酸残基。该方法纳入了几种类型的特征:位置特异性评分矩阵(PSSM)保守得分、氨基酸因子、二级结构、溶剂可及性和无序得分。通过使用最大相关最小冗余(mRMR)方法和增量特征选择(IFS)方法,从所有纳入的特征中选择了一个包含290个特征的最优特征集。由最优特征构建的基于袋装法的分类器达到了69.53%的分类准确率,马修斯相关系数(MCC)为0.3853。最优特征集分析表明,PSSM保守得分特征和氨基酸因子特征是最重要的属性,表明乙酰化和泛素化之间存在差异。我们的研究结果也支持了先前的发现,即乙酰化和泛素化使用不同的基序。本研究揭示的两种修饰之间的特征差异值得进行实验验证和进一步研究。