School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China.
School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
BMC Bioinformatics. 2020 Oct 30;21(1):489. doi: 10.1186/s12859-020-03828-4.
As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement.
In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites.
In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at https://zhulab.ahu.edu.cn/m5CPred-SVM .
作为 RNA 中最常见的转录后修饰(PTCM)之一,5-胞嘧啶甲基化在 RNA 代谢和细胞命运决定等许多生物学功能中发挥着重要作用。通过准确识别 RNA 上的 5-甲基胞嘧啶(m5C)位点,研究人员可以更好地了解 5-胞嘧啶甲基化在这些生物学功能中的确切作用。近年来,预测 m5C 位点的计算方法因其效率高、成本低而受到广泛关注。然而,这些方法的准确性和效率都不尽如人意,需要进一步改进。
在这项工作中,我们开发了一种新的计算方法 m5CPred-SVM,用于鉴定三种生物(H. sapiens、M. musculus 和 A. thaliana)中的 m5C 位点。为了构建这个模型,我们首先根据最近发表的三种方法收集了基准数据集。然后,基于 RNA 片段生成了六种类型的序列特征,并采用序列前向特征选择策略获得了最优特征子集。之后,比较了基于不同学习算法的模型性能,基于支持向量机的模型提供了最高的预测精度。最后,我们将所提出的方法 m5CPred-SVM 与几种现有的方法进行了比较,结果表明 m5CPred-SVM 提供了比以前发表的方法更高的预测精度。预计我们的方法 m5CPred-SVM 将成为准确鉴定 m5C 位点的有用工具。
在这项研究中,我们通过引入位置特异性倾向相关特征,构建了一个新的模型 m5CPred-SVM,用于预测三种不同物种的 RNA m5C 位点。结果表明,我们的模型优于现有的最先进的模型。我们的模型可以通过一个网络服务器(https://zhulab.ahu.edu.cn/m5CPred-SVM)供用户使用。