Xiang Qilin, Feng Kaiyan, Liao Bo, Liu Yuewu, Huang Guohua
College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan. China.
Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou, Guangdong. China.
Comb Chem High Throughput Screen. 2017;20(7):622-628. doi: 10.2174/1386207320666170314102647.
Protein malonylation is a newly discovered post-translational modification. Malonylation is known to closely be associated with type 2 diabetes and to play its regulatory role in fatty acid oxidation and the associated genetic disease. Identifying protein malonylations might lay a solid foundation to explore malonylation function. Due to the limitations of experimental techniques, it is a great challenge to fast and accurately identify malonylation sites.
We proposed a computational method to predict malonylation sites and to analyze malonylation pattern. We firstly extracted protein segments so that the lysine is at the center of each segment. Then, each segment was encoded by the pseudo amino acid compositions. The support vector machine classifier trained by a training dataset was built to distinguish malonylation sites from non-malonylation ones.
The leave-one-out test on the training dataset reached the accuracy of 0.7733, and the independent test on the testing dataset got 0.8889. Furthermore, the classifier also successfully identified 144 of 160 putative malonylation sites. Analyses on the differences between malonylation and non-malonylation segments implicated that lysine malonylation should follow a specific pattern, e.g. lysine with its neighbors being Glycine and Alanine might be more likely to be malonylated. Therefore, the proposed method is expected to be a promising tool to identify malonylation sites.
蛋白质丙二酰化是一种新发现的翻译后修饰。已知丙二酰化与2型糖尿病密切相关,并在脂肪酸氧化及相关遗传疾病中发挥调节作用。识别蛋白质丙二酰化可能为探索丙二酰化功能奠定坚实基础。由于实验技术的局限性,快速准确地识别丙二酰化位点是一项巨大挑战。
我们提出了一种计算方法来预测丙二酰化位点并分析丙二酰化模式。我们首先提取蛋白质片段,使赖氨酸位于每个片段的中心。然后,每个片段由伪氨基酸组成进行编码。通过训练数据集训练的支持向量机分类器被构建起来,以区分丙二酰化位点和非丙二酰化位点。
在训练数据集上的留一法测试达到了0.7733的准确率,在测试数据集上的独立测试得到了0.8889的准确率。此外,该分类器还成功识别出了160个假定丙二酰化位点中的144个。对丙二酰化和非丙二酰化片段之间差异的分析表明,赖氨酸丙二酰化应遵循特定模式,例如其相邻氨基酸为甘氨酸和丙氨酸的赖氨酸更有可能被丙二酰化。因此,所提出的方法有望成为识别丙二酰化位点的有前途的工具。