Deepm5C：一种基于深度学习的混合框架，使用堆叠策略识别人类 RNA N5-甲基胞嘧啶位点。

Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.

机构信息

Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA.

Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.

出版信息

Mol Ther. 2022 Aug 3;30(8):2856-2867. doi: 10.1016/j.ymthe.2022.05.001. Epub 2022 May 6.

DOI:10.1016/j.ymthe.2022.05.001

PMID:35526094

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9372321/

Abstract

As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis.

摘要

作为最普遍的转录后表观遗传修饰之一，N5-甲基胞嘧啶（m5C）在各种细胞过程和疾病发病机制中起着至关重要的作用。因此，准确识别 m5C 修饰对于深入了解细胞过程和其他可能的功能机制非常重要。尽管已经提出了几种计算方法，但它们各自的模型都是使用小型训练数据集开发的。因此，它们在全基因组检测中的实际应用非常有限。为了克服现有的局限性，我们提出了 Deepm5C，这是一种用于识别人类基因组中 RNA m5C 位点的生物信息学方法。为了开发 Deepm5C，我们构建了一个新的基准数据集，并研究了三种传统特征编码算法和一种来自词嵌入方法的特征的混合。之后，使用四种深度学习分类器和四种常用的传统分类器对这四种编码进行了训练，最终获得了 32 个基线模型。通过整合最优基线模型的预测输出并使用一维（1D）卷积神经网络进行训练，有效地利用了堆叠策略。结果，Deepm5C 在交叉验证中表现出色，马修斯相关系数和准确率分别为 0.697 和 0.855。在独立测试中的相应指标分别为 0.691 和 0.852。总的来说，Deepm5C 比基线模型实现了更准确和稳定的性能，并且显著优于现有的预测器，证明了我们提出的混合框架的有效性。此外，Deepm5C 有望协助全社区努力识别推定的 m5C 并提出新的可测试生物学假设。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82ab/9372321/ae0fdcfbbfe2/fx1.jpg

相似文献

Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.Deepm5C：一种基于深度学习的混合框架，使用堆叠策略识别人类 RNA N5-甲基胞嘧啶位点。

Mol Ther. 2022 Aug 3;30(8):2856-2867. doi: 10.1016/j.ymthe.2022.05.001. Epub 2022 May 6.

im5C-DSCGA: A Proposed Hybrid Framework Based on Improved DenseNet and Attention Mechanisms for Identifying 5-methylcytosine Sites in Human RNA.im5C-DSCGA：一种基于改进的 DenseNet 和注意力机制的混合框架，用于识别人类 RNA 中的 5-甲基胞嘧啶位点。

Front Biosci (Landmark Ed). 2023 Dec 26;28(12):346. doi: 10.31083/j.fbl2812346.

Deep transformers and convolutional neural network in identifying DNA N6-methyladenine sites in cross-species genomes.深度转换器和卷积神经网络在跨物种基因组中识别 DNA N6-甲基腺嘌呤位点。

Methods. 2022 Aug;204:199-206. doi: 10.1016/j.ymeth.2021.12.004. Epub 2021 Dec 13.

MLm5C: A high-precision human RNA 5-methylcytosine sites predictor based on a combination of hybrid machine learning models.MLm5C：一种基于混合机器学习模型组合的高精度人类 RNA 5-甲基胞嘧啶位点预测器。

Methods. 2024 Jul;227:37-47. doi: 10.1016/j.ymeth.2024.05.004. Epub 2024 May 8.

Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences.全面综述和评估基于 RNA 序列预测 RNA 转录后修饰位点的计算方法。

Brief Bioinform. 2020 Sep 25;21(5):1676-1696. doi: 10.1093/bib/bbz112.

DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy.DeepVF：一种基于深度学习的混合框架，使用堆叠策略识别毒力因子。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa125.

XGBoost framework with feature selection for the prediction of RNA N5-methylcytosine sites.XGBoost 框架与特征选择相结合，用于预测 RNA N5-甲基胞嘧啶位点。

Mol Ther. 2023 Aug 2;31(8):2543-2551. doi: 10.1016/j.ymthe.2023.05.016. Epub 2023 Jun 3.

DeepSSPred: A Deep Learning Based Sulfenylation Site Predictor Via a Novel nSegmented Optimize Federated Feature Encoder.DeepSSPred：一种基于深度学习的新型 nSegmented Optimize 联邦特征编码器的硫化位点预测器。

Protein Pept Lett. 2021;28(6):708-721. doi: 10.2174/0929866527666201202103411.

STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction.STALLION：一种基于堆叠的集成学习框架，用于预测细菌赖氨酸乙酰化位点。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab376.

THRONE: A New Approach for Accurate Prediction of Human RNA N7-Methylguanosine Sites.THRONE：一种准确预测人类 RNA N7-甲基鸟苷位点的新方法。

J Mol Biol. 2022 Jun 15;434(11):167549. doi: 10.1016/j.jmb.2022.167549. Epub 2022 Mar 16.

引用本文的文献

m5C RNA modification in colorectal cancer: mechanisms and therapeutic targets.结直肠癌中的m5C RNA修饰：机制与治疗靶点

J Transl Med. 2025 Aug 21;23(1):948. doi: 10.1186/s12967-025-06985-3.

A deep learning model for prediction of lysine crotonylation sites by fusing multi-features based on multi-head self-attention mechanism.一种基于多头自注意力机制融合多特征预测赖氨酸巴豆酰化位点的深度学习模型。

Sci Rep. 2025 May 29;15(1):18940. doi: 10.1038/s41598-025-04058-5.

PMPred-AE: a computational model for the detection and interpretation of pathological myopia based on artificial intelligence.PMPred-AE：一种基于人工智能的病理性近视检测与解读计算模型。

Front Med (Lausanne). 2025 Mar 13;12:1529335. doi: 10.3389/fmed.2025.1529335. eCollection 2025.

RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models.RNA序列分析全景：任务类型、数据库、数据集、词嵌入方法及语言模型的全面综述

Heliyon. 2025 Jan 6;11(2):e41488. doi: 10.1016/j.heliyon.2024.e41488. eCollection 2025 Jan 30.

AISMPred: A Machine Learning Approach for Predicting Anti-Inflammatory Small Molecules.AISMPred：一种预测抗炎小分子的机器学习方法。

Pharmaceuticals (Basel). 2024 Dec 15;17(12):1693. doi: 10.3390/ph17121693.

Meta-2OM: A multi-classifier meta-model for the accurate prediction of RNA 2'-O-methylation sites in human RNA.Meta-2OM：一种用于准确预测人类 RNA 2'-O-甲基化位点的多分类器元模型。

PLoS One. 2024 Jun 26;19(6):e0305406. doi: 10.1371/journal.pone.0305406. eCollection 2024.

Molecular insights into regulatory RNAs in the cellular machinery.分子层面解析细胞机制中的调控 RNA。

Exp Mol Med. 2024 Jun;56(6):1235-1249. doi: 10.1038/s12276-024-01239-6. Epub 2024 Jun 14.

Big data and deep learning for RNA biology.大数据和深度学习在 RNA 生物学中的应用。

Exp Mol Med. 2024 Jun;56(6):1293-1321. doi: 10.1038/s12276-024-01243-w. Epub 2024 Jun 14.

ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning.ac4C-AFL：基于自适应特征表示学习的人类mRNA N4-乙酰胞苷位点的高精度识别

Mol Ther Nucleic Acids. 2024 Apr 24;35(2):102192. doi: 10.1016/j.omtn.2024.102192. eCollection 2024 Jun 11.

Prediction of blood-brain barrier penetrating peptides based on data augmentation with Augur.基于 Augur 进行数据增强的血脑屏障穿透肽预测。

BMC Biol. 2024 Apr 19;22(1):86. doi: 10.1186/s12915-024-01883-4.

本文引用的文献

THRONE: A New Approach for Accurate Prediction of Human RNA N7-Methylguanosine Sites.THRONE：一种准确预测人类 RNA N7-甲基鸟苷位点的新方法。

J Mol Biol. 2022 Jun 15;434(11):167549. doi: 10.1016/j.jmb.2022.167549. Epub 2022 Mar 16.

m5Cpred-XS: A New Method for Predicting RNA m5C Sites Based on XGBoost and SHAP.m5Cpred-XS：一种基于XGBoost和SHAP预测RNA m5C位点的新方法。

Front Genet. 2022 Mar 30;13:853258. doi: 10.3389/fgene.2022.853258. eCollection 2022.

m5C-Atlas: a comprehensive database for decoding and annotating the 5-methylcytosine (m5C) epitranscriptome.m5C-Atlas：一个全面的数据库，用于解码和注释 5-甲基胞嘧啶（m5C）转录组内表观遗传学修饰。

Nucleic Acids Res. 2022 Jan 7;50(D1):D196-D203. doi: 10.1093/nar/gkab1075.

Detection of transcription factors binding to methylated DNA by deep recurrent neural network.通过深度递归神经网络检测与甲基化 DNA 结合的转录因子。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab533.

Staem5: A novel computational approachfor accurate prediction of m5C site.Staem5：一种用于准确预测m5C位点的新型计算方法。

Mol Ther Nucleic Acids. 2021 Oct 20;26:1027-1034. doi: 10.1016/j.omtn.2021.10.012. eCollection 2021 Dec 3.

BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.BioSeq-BLM：一个基于生物语言模型分析 DNA、RNA 和蛋白质序列的平台。

Nucleic Acids Res. 2021 Dec 16;49(22):e129. doi: 10.1093/nar/gkab829.

Porpoise: a new approach for accurate prediction of RNA pseudouridine sites.海豚：一种准确预测 RNA 假尿嘧啶位点的新方法。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab245.

DeepIPs: comprehensive assessment and computational identification of phosphorylation sites of SARS-CoV-2 infection using a deep learning-based approach.DeepIPs：基于深度学习的方法对 SARS-CoV-2 感染的磷酸化位点进行全面评估和计算识别。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab244.

StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides.StackIL6：一种用于提高白细胞介素 6 诱导肽预测能力的堆叠集成模型。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab172.

WHISTLE: A Functionally Annotated High-Accuracy Map of Human mA Epitranscriptome.WHISTLE：人类 mA 转录组功能注释的高精度图谱。

Methods Mol Biol. 2021;2284:519-529. doi: 10.1007/978-1-0716-1307-8_28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Deepm5C：一种基于深度学习的混合框架，使用堆叠策略识别人类 RNA N5-甲基胞嘧啶位点。

Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献