• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DNA 序列通过利用深度学习算法进行自然语言处理,用于识别 N4-甲基胞嘧啶。

DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.

机构信息

Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea.

School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.

出版信息

Sci Rep. 2021 Jan 8;11(1):212. doi: 10.1038/s41598-020-80430-x.

DOI:10.1038/s41598-020-80430-x
PMID:33420191
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7794489/
Abstract

N4-methylcytosine is a biochemical alteration of DNA that affects the genetic operations without modifying the DNA nucleotides such as gene expression, genomic imprinting, chromosome stability, and the development of the cell. In the proposed work, a computational model, 4mCNLP-Deep, used the word embedding approach as a vector formulation by exploiting deep learning based CNN algorithm to predict 4mC and non-4mC sites on the C.elegans genome dataset. Diversity of ranges employed for the experimental such as corpus k-mer and k-fold cross-validation to obtain the prevailing capabilities. The 4mCNLP-Deep outperform from the state-of-the-art predictor by achieving the results in five evaluation metrics by following; Accuracy (ACC) as 0.9354, Mathew's correlation coefficient (MCC) as 0.8608, Specificity (Sp) as 0.89.96, Sensitivity (Sn) as 0.9563, and Area under curve (AUC) as 0.9731 by using 3-mer corpus word2vec and 3-fold cross-validation and attained the increment of 1.1%, 0.6%, 0.58%, 0.77%, and 4.89%, respectively. At last, we developed the online webserver http://nsclbio.jbnu.ac.kr/tools/4mCNLP-Deep/ , for the experimental researchers to get the results easily.

摘要

N4-甲基胞嘧啶是一种影响遗传操作的 DNA 生化改变,它不会改变 DNA 核苷酸,如基因表达、基因组印记、染色体稳定性和细胞发育。在提出的工作中,一个名为 4mCNLP-Deep 的计算模型使用了词嵌入方法作为向量公式,利用基于深度学习的 CNN 算法来预测 C.elegans 基因组数据集上的 4mC 和非 4mC 位点。实验中使用了多种范围,如语料库 k-mer 和 k 折交叉验证,以获得流行的能力。4mCNLP-Deep 通过在以下五个评估指标中取得的结果优于最先进的预测器:准确率 (ACC) 为 0.9354、马修斯相关系数 (MCC) 为 0.8608、特异性 (Sp) 为 0.89.96、敏感性 (Sn) 为 0.9563 和曲线下面积 (AUC) 为 0.9731,使用 3-mer 语料库 word2vec 和 3 折交叉验证,分别增加了 1.1%、0.6%、0.58%、0.77%和 4.89%。最后,我们开发了在线网络服务器 http://nsclbio.jbnu.ac.kr/tools/4mCNLP-Deep/ ,供实验研究人员方便地获取结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b3/7794489/599f447bc2de/41598_2020_80430_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b3/7794489/d7dea449b1a9/41598_2020_80430_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b3/7794489/b6a9266e1c5c/41598_2020_80430_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b3/7794489/a3eb36fdf95f/41598_2020_80430_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b3/7794489/599f447bc2de/41598_2020_80430_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b3/7794489/d7dea449b1a9/41598_2020_80430_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b3/7794489/b6a9266e1c5c/41598_2020_80430_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b3/7794489/a3eb36fdf95f/41598_2020_80430_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03b3/7794489/599f447bc2de/41598_2020_80430_Fig4_HTML.jpg

相似文献

1
DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine.DNA 序列通过利用深度学习算法进行自然语言处理,用于识别 N4-甲基胞嘧啶。
Sci Rep. 2021 Jan 8;11(1):212. doi: 10.1038/s41598-020-80430-x.
2
Identifying DNA N4-methylcytosine sites in the rosaceae genome with a deep learning model relying on distributed feature representation.使用基于分布式特征表示的深度学习模型识别蔷薇科基因组中的DNA N4-甲基胞嘧啶位点。
Comput Struct Biotechnol J. 2021 Mar 19;19:1612-1619. doi: 10.1016/j.csbj.2021.03.015. eCollection 2021.
3
A Grid Search-Based Multilayer Dynamic Ensemble System to Identify DNA N4-Methylcytosine Using Deep Learning Approach.基于网格搜索的多层动态集成系统,利用深度学习方法识别 DNA N4-甲基胞嘧啶。
Genes (Basel). 2023 Feb 25;14(3):582. doi: 10.3390/genes14030582.
4
DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning.DNC4mC-Deep:基于深度学习的不同编码方案识别和分析 DNA N4-甲基胞嘧啶位点。
Cells. 2020 Jul 22;9(8):1756. doi: 10.3390/cells9081756.
5
Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning.Deep4mC:通过深度学习对 DNA N4-甲基胞嘧啶位点进行系统评估和计算预测。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa099.
6
i4mC-Deep: An Intelligent Predictor of N4-Methylcytosine Sites Using a Deep Learning Approach with Chemical Properties.i4mC-Deep:一种基于深度学习方法并结合化学性质预测 N4-甲基胞嘧啶位点的智能预测器。
Genes (Basel). 2021 Jul 23;12(8):1117. doi: 10.3390/genes12081117.
7
A Deep Neural Network for Identifying DNA N4-Methylcytosine Sites.用于识别DNA N4-甲基胞嘧啶位点的深度神经网络
Front Genet. 2020 Mar 6;11:209. doi: 10.3389/fgene.2020.00209. eCollection 2020.
8
Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli.Deep-4mCW2V:一种基于序列的预测工具,用于鉴定大肠杆菌中的 N4-甲基胞嘧啶位点。
Methods. 2022 Jul;203:558-563. doi: 10.1016/j.ymeth.2021.07.011. Epub 2021 Aug 2.
9
ZayyuNet - A Unified Deep Learning Model for the Identification of Epigenetic Modifications Using Raw Genomic Sequences.ZayyuNet- 一种利用原始基因组序列识别表观遗传修饰的统一深度学习模型。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2533-2544. doi: 10.1109/TCBB.2021.3083789. Epub 2022 Aug 8.
10
DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites.DeepTorrent:一种基于深度学习的方法,用于预测 DNA N4-甲基胞嘧啶位点。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa124.

引用本文的文献

1
Genome language modeling (GLM): a beginner's cheat sheet.基因组语言建模(GLM):初学者简易指南。
Biol Methods Protoc. 2025 Mar 25;10(1):bpaf022. doi: 10.1093/biomethods/bpaf022. eCollection 2025.
2
Protein structure prediction via deep learning: an in-depth review.基于深度学习的蛋白质结构预测:深入综述
Front Pharmacol. 2025 Apr 3;16:1498662. doi: 10.3389/fphar.2025.1498662. eCollection 2025.
3
CpGFuse: a holistic approach for accurate identification of methylation states of DNA CpG sites.CpGFuse:一种用于准确识别DNA CpG位点甲基化状态的整体方法。

本文引用的文献

1
DNA6mA-MINT: DNA-6mA Modification Identification Neural Tool.DNA6mA-MINT:DNA-6mA 修饰识别神经工具。
Genes (Basel). 2020 Aug 5;11(8):898. doi: 10.3390/genes11080898.
2
DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning.DNC4mC-Deep:基于深度学习的不同编码方案识别和分析 DNA N4-甲基胞嘧啶位点。
Cells. 2020 Jul 22;9(8):1756. doi: 10.3390/cells9081756.
3
iMethyl-Deep: N6 Methyladenosine Identification of Yeast Genome with Automatic Feature Extraction Technique by Using Deep Learning Algorithm.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf063.
4
DeepSplice: a deep learning approach for accurate prediction of alternative splicing events in the human genome.DeepSplice:一种用于准确预测人类基因组中可变剪接事件的深度学习方法。
Front Genet. 2024 Jun 21;15:1349546. doi: 10.3389/fgene.2024.1349546. eCollection 2024.
5
TMSC-m7G: A transformer architecture based on multi-sense-scaled embedding features and convolutional neural network to identify RNA N7-methylguanosine sites.TMSC-m7G:一种基于多感官尺度嵌入特征和卷积神经网络的变压器架构,用于识别RNA N7-甲基鸟苷位点。
Comput Struct Biotechnol J. 2023 Dec 1;23:129-139. doi: 10.1016/j.csbj.2023.11.052. eCollection 2024 Dec.
6
DNABERT-based explainable lncRNA identification in plant genome assemblies.基于DNABERT的植物基因组组装中可解释的长链非编码RNA识别
Comput Struct Biotechnol J. 2023 Nov 17;21:5676-5685. doi: 10.1016/j.csbj.2023.11.025. eCollection 2023.
7
Discovery of a non-canonical GRHL1 binding site using deep convolutional and recurrent neural networks.利用深度卷积和循环神经网络发现非规范的 GRHL1 结合位点。
BMC Genomics. 2023 Dec 4;24(1):736. doi: 10.1186/s12864-023-09830-3.
8
A Grid Search-Based Multilayer Dynamic Ensemble System to Identify DNA N4-Methylcytosine Using Deep Learning Approach.基于网格搜索的多层动态集成系统,利用深度学习方法识别 DNA N4-甲基胞嘧啶。
Genes (Basel). 2023 Feb 25;14(3):582. doi: 10.3390/genes14030582.
9
BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN.BERT-PPII:基于 BERT 和多通道 CNN 的聚脯氨酸 II 型螺旋结构预测模型。
Biomed Res Int. 2022 Aug 24;2022:9015123. doi: 10.1155/2022/9015123. eCollection 2022.
10
Systematic Analysis and Accurate Identification of DNA N4-Methylcytosine Sites by Deep Learning.基于深度学习的DNA N4-甲基胞嘧啶位点的系统分析与准确识别
Front Microbiol. 2022 Mar 15;13:843425. doi: 10.3389/fmicb.2022.843425. eCollection 2022.
iMethyl-Deep:利用深度学习算法通过自动特征提取技术鉴定酵母基因组中的 N6 甲基腺苷。
Genes (Basel). 2020 May 9;11(5):529. doi: 10.3390/genes11050529.
4
A Deep Neural Network for Identifying DNA N4-Methylcytosine Sites.用于识别DNA N4-甲基胞嘧啶位点的深度神经网络
Front Genet. 2020 Mar 6;11:209. doi: 10.3389/fgene.2020.00209. eCollection 2020.
5
Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning.基于深度学习的利用进化信息提高 DNA 序列定量分析。
Cells. 2019 Dec 14;8(12):1635. doi: 10.3390/cells8121635.
6
Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation.Meta-4mCpred:一种基于序列的元预测器,用于通过有效特征表示准确预测DNA 4mC位点。
Mol Ther Nucleic Acids. 2019 Jun 7;16:733-744. doi: 10.1016/j.omtn.2019.04.019. Epub 2019 Apr 30.
7
Iterative feature representations improve N4-methylcytosine site prediction.迭代特征表示可提高 N4-甲基胞嘧啶位点预测的准确性。
Bioinformatics. 2019 Dec 1;35(23):4930-4937. doi: 10.1093/bioinformatics/btz408.
8
iPseU-CNN: Identifying RNA Pseudouridine Sites Using Convolutional Neural Networks.iPseU-CNN:使用卷积神经网络识别RNA假尿苷位点。
Mol Ther Nucleic Acids. 2019 Jun 7;16:463-470. doi: 10.1016/j.omtn.2019.03.010. Epub 2019 Apr 11.
9
iRNA-PseKNC(2methyl): Identify RNA 2'-O-methylation sites by convolution neural network and Chou's pseudo components.iRNA-PseKNC(2methyl):通过卷积神经网络和周的伪成分识别 RNA 2'-O-甲基化位点。
J Theor Biol. 2019 Mar 21;465:1-6. doi: 10.1016/j.jtbi.2018.12.034. Epub 2018 Dec 24.
10
A primer on deep learning in genomics.深度学习在基因组学中的应用简介。
Nat Genet. 2019 Jan;51(1):12-18. doi: 10.1038/s41588-018-0295-5. Epub 2018 Nov 26.