• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DeepMethylation:一种基于深度学习的框架,使用 GloVe 和 Transformer 编码器进行 DNA 甲基化预测。

DeepMethylation: a deep learning based framework with GloVe and Transformer encoder for DNA methylation prediction.

机构信息

Wuhan University of Science and Technology, Wuhan, Hubei, China.

China Three Gorges University, Yichang, Hubei, China.

出版信息

PeerJ. 2023 Sep 25;11:e16125. doi: 10.7717/peerj.16125. eCollection 2023.

DOI:10.7717/peerj.16125
PMID:37780374
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10538282/
Abstract

DNA methylation is a crucial topic in bioinformatics research. Traditional wet experiments are usually time-consuming and expensive. In contrast, machine learning offers an efficient and novel approach. In this study, we propose DeepMethylation, a novel methylation predictor with deep learning. Specifically, the DNA sequence is encoded with word embedding and GloVe in the first step. After that, dilated convolution and Transformer encoder are utilized to extract the features. Finally, full connection and softmax operators are applied to predict the methylation sites. The proposed model achieves an accuracy of 97.8% on the 5mC dataset, which outperforms state-of-the-art methods. Furthermore, our predictor exhibits good generalization ability as it achieves an accuracy of 95.8% on the m1A dataset. To ease access for other researchers, our code is publicly available at https://github.com/sb111169/tf-5mc.

摘要

DNA 甲基化是生物信息学研究中的一个重要课题。传统的湿实验通常既耗时又昂贵。相比之下,机器学习提供了一种高效而新颖的方法。在这项研究中,我们提出了 DeepMethylation,这是一种基于深度学习的新型甲基化预测器。具体来说,在第一步中,DNA 序列通过词嵌入和 GloVe 进行编码。之后,使用扩张卷积和 Transformer 编码器提取特征。最后,应用全连接和 softmax 操作符来预测甲基化位点。在所提出的模型中,在 5mC 数据集上的准确率达到了 97.8%,优于最先进的方法。此外,我们的预测器表现出良好的泛化能力,在 m1A 数据集上的准确率达到了 95.8%。为了方便其他研究人员使用,我们的代码在 https://github.com/sb111169/tf-5mc 上公开可用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/1a4e0539d930/peerj-11-16125-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/92ce24e772ff/peerj-11-16125-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/6538d75a2310/peerj-11-16125-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/79936b8718cf/peerj-11-16125-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/2c8e24402cd4/peerj-11-16125-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/85230e6a05c2/peerj-11-16125-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/5897c640acb7/peerj-11-16125-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/c4df6d964a80/peerj-11-16125-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/e801b783b0e2/peerj-11-16125-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/a74ac429b71e/peerj-11-16125-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/795445629826/peerj-11-16125-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/69d81c1338aa/peerj-11-16125-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/ad36c16a07d9/peerj-11-16125-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/0aa0a1c4b6a0/peerj-11-16125-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/f2a0a30b7d02/peerj-11-16125-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/1a4e0539d930/peerj-11-16125-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/92ce24e772ff/peerj-11-16125-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/6538d75a2310/peerj-11-16125-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/79936b8718cf/peerj-11-16125-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/2c8e24402cd4/peerj-11-16125-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/85230e6a05c2/peerj-11-16125-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/5897c640acb7/peerj-11-16125-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/c4df6d964a80/peerj-11-16125-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/e801b783b0e2/peerj-11-16125-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/a74ac429b71e/peerj-11-16125-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/795445629826/peerj-11-16125-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/69d81c1338aa/peerj-11-16125-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/ad36c16a07d9/peerj-11-16125-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/0aa0a1c4b6a0/peerj-11-16125-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/f2a0a30b7d02/peerj-11-16125-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2450/10538282/1a4e0539d930/peerj-11-16125-g015.jpg

相似文献

1
DeepMethylation: a deep learning based framework with GloVe and Transformer encoder for DNA methylation prediction.DeepMethylation:一种基于深度学习的框架,使用 GloVe 和 Transformer 编码器进行 DNA 甲基化预测。
PeerJ. 2023 Sep 25;11:e16125. doi: 10.7717/peerj.16125. eCollection 2023.
2
EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction.EMDLP:用于 RNA 甲基化位点预测的集成多尺度深度学习模型。
BMC Bioinformatics. 2022 Jun 8;23(1):221. doi: 10.1186/s12859-022-04756-1.
3
EnAMP: A novel deep learning ensemble antibacterial peptide recognition algorithm based on multi-features.EnAMP:一种基于多特征的新型深度学习组合抗菌肽识别算法。
J Bioinform Comput Biol. 2024 Feb;22(1):2450001. doi: 10.1142/S021972002450001X. Epub 2024 Feb 26.
4
TEC-miTarget: enhancing microRNA target prediction based on deep learning of ribonucleic acid sequences.TEC-miTarget:基于 RNA 序列深度学习的 miRNA 靶基因预测增强方法。
BMC Bioinformatics. 2024 Apr 20;25(1):159. doi: 10.1186/s12859-024-05780-z.
5
iDNA-ABT: advanced deep learning model for detecting DNA methylation with adaptive features and transductive information maximization.iDNA-ABT:具有自适应特征和转导信息最大化的先进深度学习模型,用于检测 DNA 甲基化。
Bioinformatics. 2021 Dec 11;37(24):4603-4610. doi: 10.1093/bioinformatics/btab677.
6
KGETCDA: an efficient representation learning framework based on knowledge graph encoder from transformer for predicting circRNA-disease associations.KGETCDA:一种基于 Transformer 的知识图编码器的高效表示学习框架,用于预测 circRNA-疾病关联。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad292.
7
DeepSSPred: A Deep Learning Based Sulfenylation Site Predictor Via a Novel nSegmented Optimize Federated Feature Encoder.DeepSSPred:一种基于深度学习的新型 nSegmented Optimize 联邦特征编码器的硫化位点预测器。
Protein Pept Lett. 2021;28(6):708-721. doi: 10.2174/0929866527666201202103411.
8
Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features.深度WET:一种基于深度学习的方法,利用带加权特征的词嵌入技术预测DNA结合蛋白。
Sci Rep. 2024 Feb 5;14(1):2961. doi: 10.1038/s41598-024-52653-9.
9
BERT-5mC: an interpretable model for predicting 5-methylcytosine sites of DNA based on BERT.BERT-5mC:一种基于 BERT 的可解释模型,用于预测 DNA 的 5-甲基胞嘧啶位点。
PeerJ. 2023 Dec 8;11:e16600. doi: 10.7717/peerj.16600. eCollection 2023.
10
DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning.DeepSignal:使用深度学习从纳米孔测序reads 中检测 DNA 甲基化状态。
Bioinformatics. 2019 Nov 1;35(22):4586-4595. doi: 10.1093/bioinformatics/btz276.

引用本文的文献

1
Genome language modeling (GLM): a beginner's cheat sheet.基因组语言建模(GLM):初学者简易指南。
Biol Methods Protoc. 2025 Mar 25;10(1):bpaf022. doi: 10.1093/biomethods/bpaf022. eCollection 2025.
2
DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models.DNA序列分析全景:对DNA序列分析任务类型、数据库、数据集、词嵌入方法和语言模型的全面综述。
Front Med (Lausanne). 2025 Apr 8;12:1503229. doi: 10.3389/fmed.2025.1503229. eCollection 2025.

本文引用的文献

1
Genomics enters the deep learning era.基因组学进入深度学习时代。
PeerJ. 2022 Jun 24;10:e13613. doi: 10.7717/peerj.13613. eCollection 2022.
2
EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction.EMDLP:用于 RNA 甲基化位点预测的集成多尺度深度学习模型。
BMC Bioinformatics. 2022 Jun 8;23(1):221. doi: 10.1186/s12859-022-04756-1.
3
BiLSTM-5mC: A Bidirectional Long Short-Term Memory-Based Approach for Predicting 5-Methylcytosine Sites in Genome-Wide DNA Promoters.基于双向长短时记忆网络(BiLSTM)的 5-甲基胞嘧啶(5mC)位点预测方法:全基因组 DNA 启动子研究
Molecules. 2021 Dec 7;26(24):7414. doi: 10.3390/molecules26247414.
4
Application of lateral flow and microfluidic bio-assay and biosensing towards identification of DNA-methylation and cancer detection: Recent progress and challenges in biomedicine.侧向流和微流控生物分析和生物传感在 DNA 甲基化鉴定和癌症检测中的应用:生物医学的最新进展和挑战。
Biomed Pharmacother. 2021 Sep;141:111845. doi: 10.1016/j.biopha.2021.111845. Epub 2021 Jun 24.
5
An Extensive Examination of Discovering 5-Methylcytosine Sites in Genome-Wide DNA Promoters Using Machine Learning Based Approaches.基于机器学习的方法在全基因组 DNA 启动子中发现 5-甲基胞嘧啶位点的广泛研究。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):87-94. doi: 10.1109/TCBB.2021.3082184. Epub 2022 Feb 3.
6
4mCPred-CNN-Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network.4mCPred-CNN-使用卷积神经网络预测小鼠基因组中的 DNA N4-甲基胞嘧啶。
Genes (Basel). 2021 Feb 20;12(2):296. doi: 10.3390/genes12020296.
7
6mA-Pred: identifying DNA N6-methyladenine sites based on deep learning.6mA-Pred:基于深度学习识别DNA N6-甲基腺嘌呤位点
PeerJ. 2021 Feb 3;9:e10813. doi: 10.7717/peerj.10813. eCollection 2021.
8
iPromoter-5mC: A Novel Fusion Decision Predictor for the Identification of 5-Methylcytosine Sites in Genome-Wide DNA Promoters.iPromoter-5mC:一种用于全基因组DNA启动子中5-甲基胞嘧啶位点识别的新型融合决策预测器。
Front Cell Dev Biol. 2020 Jul 28;8:614. doi: 10.3389/fcell.2020.00614. eCollection 2020.
9
Epigenetics in Health and Disease.《健康与疾病中的表观遗传学》
Adv Exp Med Biol. 2020;1253:3-55. doi: 10.1007/978-981-15-3449-2_1.
10
A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures.递归神经网络综述:长短期记忆细胞和网络架构。
Neural Comput. 2019 Jul;31(7):1235-1270. doi: 10.1162/neco_a_01199. Epub 2019 May 21.