• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过结合多种序列特征预测RNA序列中的m5C修饰

Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features.

作者信息

Dou Lijun, Li Xiaoling, Ding Hui, Xu Lei, Xiang Huaikun

机构信息

School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.

Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China.

出版信息

Mol Ther Nucleic Acids. 2020 Sep 4;21:332-342. doi: 10.1016/j.omtn.2020.06.004. Epub 2020 Jun 10.

DOI:10.1016/j.omtn.2020.06.004
PMID:32645685
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7340967/
Abstract

5-Methylcytosine (m5C) is a well-known post-transcriptional modification that plays significant roles in biological processes, such as RNA metabolism, tRNA recognition, and stress responses. Traditional high-throughput techniques on identification of m5C sites are usually time consuming and expensive. In addition, the number of RNA sequences shows explosive growth in the post-genomic era. Thus, machine-learning-based methods are urgently requested to quickly predict RNA m5C modifications with high accuracy. Here, we propose a noval support-vector-machine (SVM)-based tool, called iRNA-m5C_SVM, by combining multiple sequence features to identify m5C sites in Arabidopsis thaliana. Eight kinds of popular feature-extraction methods were first investigated systematically. Then, four well-performing features were incorporated to construct a comprehensive model, including position-specific propensity (PSP) (PSNP, PSDP, and PSTP, associated with frequencies of nucleotides, dinucleotides, and trinucleotides, respectively), nucleotide composition (nucleic acid, di-nucleotide, and tri-nucleotide compositions; NAC, DNC, and TNC, respectively), electron-ion interaction pseudopotentials of trinucleotide (PseEIIPs), and general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-general). Evaluated accuracies over 10-fold cross-validation and independent tests achieved 73.06% and 80.15%, respectively, which showed the best predictive performances in A. thaliana among existing models. It is believed that the proposed model in this work can be a promising alternative for further research on m5C modification sites in plant.

摘要

5-甲基胞嘧啶(m5C)是一种广为人知的转录后修饰,在RNA代谢、tRNA识别和应激反应等生物过程中发挥着重要作用。传统的用于鉴定m5C位点的高通量技术通常既耗时又昂贵。此外,在后基因组时代,RNA序列的数量呈爆炸式增长。因此,迫切需要基于机器学习的方法来快速、准确地预测RNA的m5C修饰。在此,我们通过结合多种序列特征,提出了一种基于支持向量机(SVM)的新型工具iRNA-m5C_SVM,用于鉴定拟南芥中的m5C位点。首先系统地研究了八种流行的特征提取方法。然后,纳入了四种性能良好的特征来构建一个综合模型,包括位置特异性倾向(PSP)(分别与核苷酸、二核苷酸和三核苷酸频率相关的PSNP、PSDP和PSTP)、核苷酸组成(分别为核酸、二核苷酸和三核苷酸组成;NAC、DNC和TNC)、三核苷酸的电子-离子相互作用赝势(PseEIIPs)以及广义平行相关伪二核苷酸组成(PC-PseDNC-general)。在10折交叉验证和独立测试中的评估准确率分别达到了73.06%和80.15%,这表明在现有模型中,该模型在拟南芥中具有最佳的预测性能。相信这项工作中提出的模型可以成为进一步研究植物m5C修饰位点的一个有前景的替代方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7470/7340967/d75bb8d42ddb/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7470/7340967/66ca71c5ea3e/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7470/7340967/abc7956928d1/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7470/7340967/e2385b84e41b/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7470/7340967/d75bb8d42ddb/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7470/7340967/66ca71c5ea3e/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7470/7340967/abc7956928d1/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7470/7340967/e2385b84e41b/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7470/7340967/d75bb8d42ddb/gr3.jpg

相似文献

1
Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features.通过结合多种序列特征预测RNA序列中的m5C修饰
Mol Ther Nucleic Acids. 2020 Sep 4;21:332-342. doi: 10.1016/j.omtn.2020.06.004. Epub 2020 Jun 10.
2
RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition.RNAm5CPred:基于三种不同核苷酸组成的RNA 5-甲基胞嘧啶位点预测
Mol Ther Nucleic Acids. 2019 Dec 6;18:739-747. doi: 10.1016/j.omtn.2019.10.008. Epub 2019 Oct 18.
3
Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem?RNA假尿苷修饰预测问题中是否存在任何序列特征?
Mol Ther Nucleic Acids. 2020 Mar 6;19:293-303. doi: 10.1016/j.omtn.2019.11.014. Epub 2019 Nov 21.
4
m5CPred-SVM: a novel method for predicting m5C sites of RNA.m5CPred-SVM:一种预测 RNA m5C 位点的新方法。
BMC Bioinformatics. 2020 Oct 30;21(1):489. doi: 10.1186/s12859-020-03828-4.
5
im5C-DSCGA: A Proposed Hybrid Framework Based on Improved DenseNet and Attention Mechanisms for Identifying 5-methylcytosine Sites in Human RNA.im5C-DSCGA:一种基于改进的 DenseNet 和注意力机制的混合框架,用于识别人类 RNA 中的 5-甲基胞嘧啶位点。
Front Biosci (Landmark Ed). 2023 Dec 26;28(12):346. doi: 10.31083/j.fbl2812346.
6
An improved residual network using deep fusion for identifying RNA 5-methylcytosine sites.一种使用深度融合的改进残差网络,用于识别 RNA 5-甲基胞嘧啶位点。
Bioinformatics. 2022 Sep 15;38(18):4271-4277. doi: 10.1093/bioinformatics/btac532.
7
Evaluation of different computational methods on 5-methylcytosine sites identification.不同计算方法在 5-甲基胞嘧啶位点识别中的评估。
Brief Bioinform. 2020 May 21;21(3):982-995. doi: 10.1093/bib/bbz048.
8
Transcriptome-Wide Annotation of mC RNA Modifications Using Machine Learning.使用机器学习对m⁶A RNA修饰进行全转录组注释
Front Plant Sci. 2018 Apr 18;9:519. doi: 10.3389/fpls.2018.00519. eCollection 2018.
9
Staem5: A novel computational approachfor accurate prediction of m5C site.Staem5:一种用于准确预测m5C位点的新型计算方法。
Mol Ther Nucleic Acids. 2021 Oct 20;26:1027-1034. doi: 10.1016/j.omtn.2021.10.012. eCollection 2021 Dec 3.
10
PseUI: Pseudouridine sites identification based on RNA sequence information.PseUI:基于 RNA 序列信息的假尿嘧啶核苷位点鉴定。
BMC Bioinformatics. 2018 Aug 29;19(1):306. doi: 10.1186/s12859-018-2321-0.

引用本文的文献

1
Identification of m5C RNA modification-related gene signature for predicting prognosis and immune microenvironment-related characteristics of heart failure.用于预测心力衰竭预后及免疫微环境相关特征的m5C RNA修饰相关基因特征的鉴定
Hereditas. 2025 May 22;162(1):83. doi: 10.1186/s41065-025-00454-z.
2
Human essential gene identification based on feature fusion and feature screening.基于特征融合与特征筛选的人类必需基因识别
IET Syst Biol. 2024 Dec;18(6):227-237. doi: 10.1049/syb2.12105. Epub 2024 Nov 22.
3
A predictive approach for host-pathogen interactions using deep learning and protein sequences.

本文引用的文献

1
RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition.RNAm5CPred:基于三种不同核苷酸组成的RNA 5-甲基胞嘧啶位点预测
Mol Ther Nucleic Acids. 2019 Dec 6;18:739-747. doi: 10.1016/j.omtn.2019.10.008. Epub 2019 Oct 18.
2
DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks.DeepSVM-fold:通过结合支持向量机和深度学习网络生成的成对序列相似性得分来进行蛋白质折叠识别。
Brief Bioinform. 2020 Sep 25;21(5):1733-1741. doi: 10.1093/bib/bbz098.
3
A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae.
一种利用深度学习和蛋白质序列预测宿主-病原体相互作用的方法。
Virusdisease. 2024 Sep;35(3):434-445. doi: 10.1007/s13337-024-00882-x. Epub 2024 Jul 16.
4
Biological Sequence Classification: A Review on Data and General Methods.生物序列分类:数据与通用方法综述
Research (Wash D C). 2022 Dec 19;2022:0011. doi: 10.34133/research.0011. eCollection 2022.
5
A CNN based m5c RNA methylation predictor.基于 CNN 的 m5c RNA 甲基化预测器。
Sci Rep. 2023 Dec 11;13(1):21885. doi: 10.1038/s41598-023-48751-9.
6
Contribution of m5C RNA Modification-Related Genes to Prognosis and Immunotherapy Prediction in Patients with Ovarian Cancer.m5C RNA 修饰相关基因对卵巢癌患者预后和免疫治疗预测的贡献。
Mediators Inflamm. 2023 Nov 13;2023:1400267. doi: 10.1155/2023/1400267. eCollection 2023.
7
EMDL-ac4C: identifying N4-acetylcytidine based on ensemble two-branch residual connection DenseNet and attention.EMDL-ac4C:基于集成双分支残差连接密集网络和注意力机制识别N4-乙酰胞苷
Front Genet. 2023 Jul 13;14:1232038. doi: 10.3389/fgene.2023.1232038. eCollection 2023.
8
m6Aminer: Predicting the m6Am Sites on mRNA by Fusing Multiple Sequence-Derived Features into a CatBoost-Based Classifier.m6Aminer:通过将多种序列衍生特征融合到基于 CatBoost 的分类器中,预测 mRNA 上的 m6A 位点。
Int J Mol Sci. 2023 Apr 26;24(9):7878. doi: 10.3390/ijms24097878.
9
Self-attention enabled deep learning of dihydrouridine (D) modification on mRNAs unveiled a distinct sequence signature from tRNAs.自我注意力机制助力的深度学习揭示了信使核糖核酸(mRNA)上二氢尿嘧啶(D)修饰的独特序列特征,该特征与转运核糖核酸(tRNA)不同。
Mol Ther Nucleic Acids. 2023 Jan 27;31:411-420. doi: 10.1016/j.omtn.2023.01.014. eCollection 2023 Mar 14.
10
Dynamic regulation and key roles of ribonucleic acid methylation.核糖核酸甲基化的动态调控及关键作用
Front Cell Neurosci. 2022 Dec 19;16:1058083. doi: 10.3389/fncel.2022.1058083. eCollection 2022.
全面比较和分析酿酒酵母 RNA N6-甲基腺苷位点的计算预测因子。
Brief Funct Genomics. 2019 Nov 19;18(6):367-376. doi: 10.1093/bfgp/elz018.
4
iRNA-m2G: Identifying N-methylguanosine Sites Based on Sequence-Derived Information.iRNA-m2G:基于序列衍生信息识别N-甲基鸟苷位点
Mol Ther Nucleic Acids. 2019 Dec 6;18:253-258. doi: 10.1016/j.omtn.2019.08.023. Epub 2019 Aug 28.
5
iPromoter-2L2.0: Identifying Promoters and Their Types by Combining Smoothing Cutting Window Algorithm and Sequence-Based Features.iPromoter-2L2.0:结合平滑切割窗口算法和基于序列的特征识别启动子及其类型
Mol Ther Nucleic Acids. 2019 Dec 6;18:80-87. doi: 10.1016/j.omtn.2019.08.008. Epub 2019 Aug 14.
6
BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches.BioSeq-Analysis2.0:一个基于机器学习方法的更新平台,用于在序列水平和残基水平上分析 DNA、RNA 和蛋白质序列。
Nucleic Acids Res. 2019 Nov 18;47(20):e127. doi: 10.1093/nar/gkz740.
7
Evaluation of different computational methods on 5-methylcytosine sites identification.不同计算方法在 5-甲基胞嘧啶位点识别中的评估。
Brief Bioinform. 2020 May 21;21(3):982-995. doi: 10.1093/bib/bbz048.
8
Incorporating Distance-Based Top-n-gram and Random Forest To Identify Electron Transport Proteins.基于距离的 Top-n-gram 和随机森林在鉴定电子传递蛋白中的应用。
J Proteome Res. 2019 Jul 5;18(7):2931-2939. doi: 10.1021/acs.jproteome.9b00250. Epub 2019 Jun 3.
9
iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data.iLearn:一个集成平台和元学习者,用于 DNA、RNA 和蛋白质序列数据的特征工程、机器学习分析和建模。
Brief Bioinform. 2020 May 21;21(3):1047-1057. doi: 10.1093/bib/bbz041.
10
PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences.PyFeat:一个基于 Python 的用于 DNA、RNA 和蛋白质序列的有效特征生成工具。
Bioinformatics. 2019 Oct 1;35(19):3831-3833. doi: 10.1093/bioinformatics/btz165.