Shien Dray-Ming, Lee Tzong-Yi, Chang Wen-Chi, Hsu Justin Bo-Kai, Horng Jorng-Tzong, Hsu Po-Chiang, Wang Ting-Yuan, Huang Hsien-Da
Department of Computer Science and Information Engineering, National Central University, Chung-Li 320, Taiwan.
J Comput Chem. 2009 Jul 15;30(9):1532-43. doi: 10.1002/jcc.21232.
Studies over the last few years have identified protein methylation on histones and other proteins that are involved in the regulation of gene transcription. Several works have developed approaches to identify computationally the potential methylation sites on lysine and arginine. Studies of protein tertiary structure have demonstrated that the sites of protein methylation are preferentially in regions that are easily accessible. However, previous studies have not taken into account the solvent-accessible surface area (ASA) that surrounds the methylation sites. This work presents a method named MASA that combines the support vector machine with the sequence and structural characteristics of proteins to identify methylation sites on lysine, arginine, glutamate, and asparagine. Since most experimental methylation sites are not associated with corresponding protein tertiary structures in the Protein Data Bank, the effective solvent-accessible prediction tools have been adopted to determine the potential ASA values of amino acids in proteins. Evaluation of predictive performance by cross-validation indicates that the ASA values around the methylation sites can improve the accuracy of prediction. Additionally, an independent test reveals that the prediction accuracies for methylated lysine and arginine are 80.8 and 85.0%, respectively. Finally, the proposed method is implemented as an effective system for identifying protein methylation sites. The developed web server is freely available at http://MASA.mbc.nctu.edu.tw/.
过去几年的研究已经确定了组蛋白和其他参与基因转录调控的蛋白质上的甲基化。一些研究已经开发出计算方法来识别赖氨酸和精氨酸上潜在的甲基化位点。蛋白质三级结构的研究表明,蛋白质甲基化位点优先位于易于接近的区域。然而,以前的研究没有考虑甲基化位点周围的溶剂可及表面积(ASA)。这项工作提出了一种名为MASA的方法,该方法将支持向量机与蛋白质的序列和结构特征相结合,以识别赖氨酸、精氨酸、谷氨酸和天冬酰胺上的甲基化位点。由于大多数实验甲基化位点在蛋白质数据库中没有相应的蛋白质三级结构,因此采用了有效的溶剂可及性预测工具来确定蛋白质中氨基酸的潜在ASA值。通过交叉验证对预测性能进行评估表明,甲基化位点周围的ASA值可以提高预测的准确性。此外,独立测试表明,甲基化赖氨酸和精氨酸的预测准确率分别为80.8%和85.0%。最后,所提出的方法被实现为一个识别蛋白质甲基化位点的有效系统。开发的网络服务器可在http://MASA.mbc.nctu.edu.tw/免费获取。