Yang Hui, Lv Hao, Ding Hui, Chen Wei, Lin Hao
1 Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China , Chengdu, China .
2 Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology , Tangshan, China .
J Comput Biol. 2018 Nov;25(11):1266-1277. doi: 10.1089/cmb.2018.0004. Epub 2018 Aug 16.
2'-O-methylation plays an important biological role in gene expression. Owing to the explosive increase in genomic sequencing data, it is necessary to develop a method for quickly and efficiently identifying whether a sequence contains the 2'-O-methylation site. As an additional method to the experimental technique, a computational method may help to identify 2'-O-methylation sites. In this study, based on the experimental 2'-O-methylation data of Homo sapiens, we proposed a support vector machine-based model to predict 2'-O-methylation sites in H. sapiens. In this model, the RNA sequences were encoded with the optimal features obtained from feature selection. In the fivefold cross-validation test, the accuracy reached 97.95%.
2'-O-甲基化在基因表达中发挥着重要的生物学作用。由于基因组测序数据的爆炸式增长,开发一种快速高效地识别序列是否包含2'-O-甲基化位点的方法很有必要。作为实验技术的一种补充方法,计算方法可能有助于识别2'-O-甲基化位点。在本研究中,基于智人的实验性2'-O-甲基化数据,我们提出了一种基于支持向量机的模型来预测智人中的2'-O-甲基化位点。在该模型中,RNA序列用从特征选择中获得的最优特征进行编码。在五折交叉验证测试中,准确率达到了97.95%。