Suppr超能文献

Deep4mC:通过深度学习对 DNA N4-甲基胞嘧啶位点进行系统评估和计算预测。

Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning.

机构信息

Center for Precision Health, School of Biomedical Informatics.

出版信息

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa099.

Abstract

DNA N4-methylcytosine (4mC) modification represents a novel epigenetic regulation. It involves in various cellular processes, including DNA replication, cell cycle and gene expression, among others. In addition to experimental identification of 4mC sites, in silico prediction of 4mC sites in the genome has emerged as an alternative and promising approach. In this study, we first reviewed the current progress in the computational prediction of 4mC sites and systematically evaluated the predictive capacity of eight conventional machine learning algorithms as well as 12 feature types commonly used in previous studies in six species. Using a representative benchmark dataset, we investigated the contribution of feature selection and stacking approach to the model construction, and found that feature optimization and proper reinforcement learning could improve the performance. We next recollected newly added 4mC sites in the six species' genomes and developed a novel deep learning-based 4mC site predictor, namely Deep4mC. Deep4mC applies convolutional neural networks with four representative features. For species with small numbers of samples, we extended our deep learning framework with a bootstrapping method. Our evaluation indicated that Deep4mC could obtain high accuracy and robust performance with the average area under curve (AUC) values greater than 0.9 in all species (range: 0.9005-0.9722). In comparison, Deep4mC achieved an AUC value improvement from 10.14 to 46.21% when compared to previous tools in these six species. A user-friendly web server (https://bioinfo.uth.edu/Deep4mC) was built for predicting putative 4mC sites in a genome.

摘要

DNA N4-甲基胞嘧啶(4mC)修饰代表了一种新的表观遗传调控方式。它涉及多种细胞过程,包括 DNA 复制、细胞周期和基因表达等。除了实验鉴定 4mC 位点外,在基因组中预测 4mC 位点已成为一种替代且有前途的方法。在这项研究中,我们首先回顾了计算预测 4mC 位点的最新进展,并系统评估了八种常规机器学习算法以及过去研究中常用的 12 种特征类型在六个物种中的预测能力。使用具有代表性的基准数据集,我们研究了特征选择和堆叠方法对模型构建的贡献,并发现特征优化和适当的强化学习可以提高性能。接下来,我们重新收集了六个物种基因组中新增的 4mC 位点,并开发了一种新的基于深度学习的 4mC 位点预测器,即 Deep4mC。Deep4mC 使用具有四个代表性特征的卷积神经网络。对于样本数量较少的物种,我们使用引导方法扩展了我们的深度学习框架。我们的评估表明,Deep4mC 可以在所有物种中获得高准确性和稳健的性能,平均曲线下面积(AUC)值均大于 0.9(范围:0.9005-0.9722)。相比之下,Deep4mC 在这六个物种中的 AUC 值相对于以前的工具提高了 10.14%至 46.21%。我们还构建了一个用户友好的网络服务器(https://bioinfo.uth.edu/Deep4mC),用于预测基因组中的潜在 4mC 位点。

相似文献

2
DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites.
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa124.
3
Identifying DNA N4-methylcytosine sites in the rosaceae genome with a deep learning model relying on distributed feature representation.
Comput Struct Biotechnol J. 2021 Mar 19;19:1612-1619. doi: 10.1016/j.csbj.2021.03.015. eCollection 2021.
4
Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species.
Bioinformatics. 2019 Apr 15;35(8):1326-1333. doi: 10.1093/bioinformatics/bty824.
7
Developing a Multi-Layer Deep Learning Based Predictive Model to Identify DNA N4-Methylcytosine Modifications.
Front Bioeng Biotechnol. 2020 Apr 21;8:274. doi: 10.3389/fbioe.2020.00274. eCollection 2020.
8
10
A Deep Neural Network for Identifying DNA N4-Methylcytosine Sites.
Front Genet. 2020 Mar 6;11:209. doi: 10.3389/fgene.2020.00209. eCollection 2020.

引用本文的文献

1
DeepRNAac4C: a hybrid deep learning framework for RNA N4-acetylcytidine site prediction.
Front Genet. 2025 Aug 25;16:1622899. doi: 10.3389/fgene.2025.1622899. eCollection 2025.
3
SVM-LncRNAPro: An SVM-Based Method for Predicting Long Noncoding RNA Promoters.
IET Syst Biol. 2025 Jan-Dec;19(1):e70013. doi: 10.1049/syb2.70013.
4
iResNetDM: An interpretable deep learning approach for four types of DNA methylation modification prediction.
Comput Struct Biotechnol J. 2024 Nov 13;23:4214-4221. doi: 10.1016/j.csbj.2024.11.006. eCollection 2024 Dec.
5
iDNA-ITLM: An interpretable and transferable learning model for identifying DNA methylation.
PLoS One. 2024 Oct 31;19(10):e0301791. doi: 10.1371/journal.pone.0301791. eCollection 2024.
6
Benchmarking DNA Foundation Models for Genomic Sequence Classification.
bioRxiv. 2024 Aug 18:2024.08.16.608288. doi: 10.1101/2024.08.16.608288.
7
iDNA-OpenPrompt: OpenPrompt learning model for identifying DNA methylation.
Front Genet. 2024 Apr 16;15:1377285. doi: 10.3389/fgene.2024.1377285. eCollection 2024.

本文引用的文献

1
GPS 5.0: An Update on the Prediction of Kinase-specific Phosphorylation Sites in Proteins.
Genomics Proteomics Bioinformatics. 2020 Feb;18(1):72-80. doi: 10.1016/j.gpb.2020.01.001. Epub 2020 Mar 19.
2
6mA-Finder: a novel online tool for predicting DNA N6-methyladenine sites in genomes.
Bioinformatics. 2020 May 1;36(10):3257-3259. doi: 10.1093/bioinformatics/btaa113.
3
RNAm5CPred: Prediction of RNA 5-Methylcytosine Sites Based on Three Different Kinds of Nucleotide Composition.
Mol Ther Nucleic Acids. 2019 Dec 6;18:739-747. doi: 10.1016/j.omtn.2019.10.008. Epub 2019 Oct 18.
8
MDR: an integrative DNA N6-methyladenine and N4-methylcytosine modification database for Rosaceae.
Hortic Res. 2019 Jun 15;6:78. doi: 10.1038/s41438-019-0160-4. eCollection 2019.
9
Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation.
Mol Ther Nucleic Acids. 2019 Jun 7;16:733-744. doi: 10.1016/j.omtn.2019.04.019. Epub 2019 Apr 30.
10
Iterative feature representations improve N4-methylcytosine site prediction.
Bioinformatics. 2019 Dec 1;35(23):4930-4937. doi: 10.1093/bioinformatics/btz408.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验