m5CPred-SVM：一种预测 RNA m5C 位点的新方法。

m5CPred-SVM: a novel method for predicting m5C sites of RNA.

机构信息

School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China.

School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.

出版信息

BMC Bioinformatics. 2020 Oct 30;21(1):489. doi: 10.1186/s12859-020-03828-4.

DOI:10.1186/s12859-020-03828-4

PMID:33126851

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7602301/

Abstract

BACKGROUND

As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement.

RESULTS

In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites.

CONCLUSION

In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at https://zhulab.ahu.edu.cn/m5CPred-SVM .

摘要

背景

作为 RNA 中最常见的转录后修饰（PTCM）之一，5-胞嘧啶甲基化在 RNA 代谢和细胞命运决定等许多生物学功能中发挥着重要作用。通过准确识别 RNA 上的 5-甲基胞嘧啶（m5C）位点，研究人员可以更好地了解 5-胞嘧啶甲基化在这些生物学功能中的确切作用。近年来，预测 m5C 位点的计算方法因其效率高、成本低而受到广泛关注。然而，这些方法的准确性和效率都不尽如人意，需要进一步改进。

结果

在这项工作中，我们开发了一种新的计算方法 m5CPred-SVM，用于鉴定三种生物（H. sapiens、M. musculus 和 A. thaliana）中的 m5C 位点。为了构建这个模型，我们首先根据最近发表的三种方法收集了基准数据集。然后，基于 RNA 片段生成了六种类型的序列特征，并采用序列前向特征选择策略获得了最优特征子集。之后，比较了基于不同学习算法的模型性能，基于支持向量机的模型提供了最高的预测精度。最后，我们将所提出的方法 m5CPred-SVM 与几种现有的方法进行了比较，结果表明 m5CPred-SVM 提供了比以前发表的方法更高的预测精度。预计我们的方法 m5CPred-SVM 将成为准确鉴定 m5C 位点的有用工具。

结论

在这项研究中，我们通过引入位置特异性倾向相关特征，构建了一个新的模型 m5CPred-SVM，用于预测三种不同物种的 RNA m5C 位点。结果表明，我们的模型优于现有的最先进的模型。我们的模型可以通过一个网络服务器（https://zhulab.ahu.edu.cn/m5CPred-SVM）供用户使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef2d/7602301/749051da309a/12859_2020_3828_Fig1_HTML.jpg

相似文献

m5CPred-SVM: a novel method for predicting m5C sites of RNA.

BMC Bioinformatics. 2020 Oct 30;21(1):489. doi: 10.1186/s12859-020-03828-4.

m5Cpred-XS: A New Method for Predicting RNA m5C Sites Based on XGBoost and SHAP.

Front Genet. 2022 Mar 30;13:853258. doi: 10.3389/fgene.2022.853258. eCollection 2022.

PseUI: Pseudouridine sites identification based on RNA sequence information.

BMC Bioinformatics. 2018 Aug 29;19(1):306. doi: 10.1186/s12859-018-2321-0.

RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition.

Mol Ther Nucleic Acids. 2019 Dec 6;18:739-747. doi: 10.1016/j.omtn.2019.10.008. Epub 2019 Oct 18.

Evaluation of different computational methods on 5-methylcytosine sites identification.

Brief Bioinform. 2020 May 21;21(3):982-995. doi: 10.1093/bib/bbz048.

im5C-DSCGA: A Proposed Hybrid Framework Based on Improved DenseNet and Attention Mechanisms for Identifying 5-methylcytosine Sites in Human RNA.

Front Biosci (Landmark Ed). 2023 Dec 26;28(12):346. doi: 10.31083/j.fbl2812346.

RNAm5Cfinder: A Web-server for Predicting RNA 5-methylcytosine (m5C) Sites Based on Random Forest.

Sci Rep. 2018 Nov 23;8(1):17299. doi: 10.1038/s41598-018-35502-4.

XGBoost framework with feature selection for the prediction of RNA N5-methylcytosine sites.

Mol Ther. 2023 Aug 2;31(8):2543-2551. doi: 10.1016/j.ymthe.2023.05.016. Epub 2023 Jun 3.

MLm5C: A high-precision human RNA 5-methylcytosine sites predictor based on a combination of hybrid machine learning models.

Methods. 2024 Jul;227:37-47. doi: 10.1016/j.ymeth.2024.05.004. Epub 2024 May 8.

Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features.

Mol Ther Nucleic Acids. 2020 Sep 4;21:332-342. doi: 10.1016/j.omtn.2020.06.004. Epub 2020 Jun 10.

引用本文的文献

Detection, molecular function and mechanisms of m5C in cancer.

Clin Transl Med. 2025 Mar;15(3):e70239. doi: 10.1002/ctm2.70239.

m5c-iEnsem: 5-methylcytosine sites identification through ensemble models.

Bioinformatics. 2022 Jan 1;41(1). doi: 10.1093/bioinformatics/btae722.

Bioinformatics for Inosine: Tools and Approaches to Trace This Elusive RNA Modification.

Genes (Basel). 2024 Jul 29;15(8):996. doi: 10.3390/genes15080996.

Critical roles and clinical perspectives of RNA methylation in cancer.

MedComm (2020). 2024 May 7;5(5):e559. doi: 10.1002/mco2.559. eCollection 2024 May.

i5mC-DCGA: an improved hybrid network framework based on the CBAM attention mechanism for identifying promoter 5mC sites.

BMC Genomics. 2024 Mar 5;25(1):242. doi: 10.1186/s12864-024-10154-z.

m C regulator-mediated methylation modification patterns and tumor microenvironment infiltration characteristics in acute myeloid leukemia.

Immun Inflamm Dis. 2024 Jan;12(1):e1150. doi: 10.1002/iid3.1150.

XGBoost framework with feature selection for the prediction of RNA N5-methylcytosine sites.

Mol Ther. 2023 Aug 2;31(8):2543-2551. doi: 10.1016/j.ymthe.2023.05.016. Epub 2023 Jun 3.

m6Aminer: Predicting the m6Am Sites on mRNA by Fusing Multiple Sequence-Derived Features into a CatBoost-Based Classifier.

Int J Mol Sci. 2023 Apr 26;24(9):7878. doi: 10.3390/ijms24097878.

Dynamic regulation and key roles of ribonucleic acid methylation.

Front Cell Neurosci. 2022 Dec 19;16:1058083. doi: 10.3389/fncel.2022.1058083. eCollection 2022.

Prediction of polyreactive and nonspecific single-chain fragment variables through structural biochemical features and protein language-based descriptors.

BMC Bioinformatics. 2022 Dec 5;23(1):520. doi: 10.1186/s12859-022-05010-4.

本文引用的文献

An Interpretable Prediction Model for Identifying N-Methylguanosine Sites Based on XGBoost and SHAP.

Mol Ther Nucleic Acids. 2020 Aug 25;22:362-372. doi: 10.1016/j.omtn.2020.08.022. eCollection 2020 Dec 4.

DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa124.

PROSPECT: A web server for predicting protein histidine phosphorylation sites.

J Bioinform Comput Biol. 2020 Aug;18(4):2050018. doi: 10.1142/S0219720020500183. Epub 2020 Jun 5.

RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition.

Mol Ther Nucleic Acids. 2019 Dec 6;18:739-747. doi: 10.1016/j.omtn.2019.10.008. Epub 2019 Oct 18.

Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences.

Brief Bioinform. 2020 Sep 25;21(5):1676-1696. doi: 10.1093/bib/bbz112.

A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae.

Brief Funct Genomics. 2019 Nov 19;18(6):367-376. doi: 10.1093/bfgp/elz018.

PRISMOID: a comprehensive 3D structure database for post-translational modifications and mutations with functional impact.

Brief Bioinform. 2020 May 21;21(3):1069-1079. doi: 10.1093/bib/bbz050.

Evaluation of different computational methods on 5-methylcytosine sites identification.

Brief Bioinform. 2020 May 21;21(3):982-995. doi: 10.1093/bib/bbz048.

iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data.

Brief Bioinform. 2020 May 21;21(3):1047-1057. doi: 10.1093/bib/bbz041.

Transcriptome-wide profiling of multiple RNA modifications simultaneously at single-base resolution.

Proc Natl Acad Sci U S A. 2019 Apr 2;116(14):6784-6789. doi: 10.1073/pnas.1817334116. Epub 2019 Mar 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

m5CPred-SVM：一种预测 RNA m5C 位点的新方法。

m5CPred-SVM: a novel method for predicting m5C sites of RNA.

机构信息

School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China.

School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.