• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用基于BERT的深度学习模型来识别增强子及其强度。

Utilizing a deep learning model based on BERT for identifying enhancers and their strength.

作者信息

Wang Tong, Gao Mengqi

机构信息

School of Computer and Information Engineering, Shanghai Polytechnic University, Shanghai, China.

出版信息

PLoS One. 2025 Apr 9;20(4):e0320085. doi: 10.1371/journal.pone.0320085. eCollection 2025.

DOI:10.1371/journal.pone.0320085
PMID:40203028
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11981215/
Abstract

An enhancer is a specific DNA sequence typically located within a gene at upstream or downstream position and serves as a pivotal element in the regulation of eukaryotic gene transcription. Therefore, the recognition of enhancers is highly significant for comprehending gene expression regulatory systems. While some useful predictive models have been proposed, there are still deficiencies in these models. To address current limitations, we propose a model, DNABERT2-Enhancer, based on transformer architecture and deep learning, designed for the recognition of enhancers (classified as either enhancer or non-enhancer) and the identification of their activity (strong or weak enhancers). More specifically, DNABERT2-Enhancer is composed of a BERT model for extracting features and a CNN model for enhancers classification. Parameters of the BERT model are initialized by a pre-training DNABERT-2 language model. The enhancer recognition task is then fine-tuned through transfer learning to convert the original sequence into feature vectors. Subsequently, the CNN network is employed to learn the feature vector generated by BERT and produce the prediction results. In comparison with existing predictors utilizing the identical dataset, our approach demonstrates superior performance. This suggests that the model will be a useful instrument for academic research on the enhancer recognition.

摘要

增强子是一种特定的DNA序列,通常位于基因的上游或下游位置,是真核基因转录调控中的关键元件。因此,识别增强子对于理解基因表达调控系统具有重要意义。虽然已经提出了一些有用的预测模型,但这些模型仍存在不足。为了解决当前的局限性,我们提出了一种基于Transformer架构和深度学习的模型DNABERT2-Enhancer,用于识别增强子(分为增强子或非增强子)并确定其活性(强或弱增强子)。更具体地说,DNABERT2-Enhancer由用于提取特征的BERT模型和用于增强子分类的CNN模型组成。BERT模型的参数由预训练的DNABERT-2语言模型初始化。然后通过迁移学习对增强子识别任务进行微调,将原始序列转换为特征向量。随后,使用CNN网络学习由BERT生成的特征向量并产生预测结果。与使用相同数据集的现有预测器相比,我们的方法表现出卓越的性能。这表明该模型将成为增强子识别学术研究的有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/56b2692ed6e2/pone.0320085.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/bec5b509a67e/pone.0320085.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/13b654b7efbc/pone.0320085.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/f77847c17913/pone.0320085.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/0e73731b2876/pone.0320085.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/d0806c8bac93/pone.0320085.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/871cbfdbfaf2/pone.0320085.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/56b2692ed6e2/pone.0320085.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/bec5b509a67e/pone.0320085.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/13b654b7efbc/pone.0320085.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/f77847c17913/pone.0320085.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/0e73731b2876/pone.0320085.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/d0806c8bac93/pone.0320085.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/871cbfdbfaf2/pone.0320085.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d9e9/11981215/56b2692ed6e2/pone.0320085.g007.jpg

相似文献

1
Utilizing a deep learning model based on BERT for identifying enhancers and their strength.利用基于BERT的深度学习模型来识别增强子及其强度。
PLoS One. 2025 Apr 9;20(4):e0320085. doi: 10.1371/journal.pone.0320085. eCollection 2025.
2
DeepDualEnhancer: A Dual-Feature Input DNABert Based Deep Learning Method for Enhancer Recognition.DeepDualEnhancer:一种基于双特征输入的 DNA 语言模型的深度学习方法,用于增强子识别。
Int J Mol Sci. 2024 Nov 1;25(21):11744. doi: 10.3390/ijms252111744.
3
A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.基于 BERT 和二维卷积神经网络的变压器架构,用于从序列信息中识别 DNA 增强子。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab005.
4
A multi-perspective deep learning framework for enhancer characterization and identification.一种用于增强子表征与识别的多视角深度学习框架。
Comput Biol Chem. 2025 Feb;114:108284. doi: 10.1016/j.compbiolchem.2024.108284. Epub 2024 Nov 19.
5
iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength.iEnhancer-GAN:一种结合词嵌入和序列生成对抗网络以识别增强子及其强度的深度学习框架。
Int J Mol Sci. 2021 Mar 30;22(7):3589. doi: 10.3390/ijms22073589.
6
Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions.打开黑箱:一种基于可解释深度神经网络的细胞类型特异性增强子预测分类器。
BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):54. doi: 10.1186/s12918-016-0302-3.
7
BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning.BERT-TFBS:一种基于迁移学习的用于预测转录因子结合位点的新型基于BERT的模型。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae195.
8
iEnhancer-DCLA: using the original sequence to identify enhancers and their strength based on a deep learning framework.iEnhancer-DCLA:基于深度学习框架,使用原始序列识别增强子及其强度。
BMC Bioinformatics. 2022 Nov 14;23(1):480. doi: 10.1186/s12859-022-05033-x.
9
EPI-Trans: an effective transformer-based deep learning model for enhancer promoter interaction prediction.EPI-Trans:一种基于转换器的有效的深度学习模型,用于增强子-启动子相互作用预测。
BMC Bioinformatics. 2024 Jun 18;25(1):216. doi: 10.1186/s12859-024-05784-9.
10
Sequence based predictor for discrimination of enhancer and their types by applying general form of Chou's trinucleotide composition.基于序列的增强子预测器及其类型的判别,应用 Chou 的三核苷酸组成的通用形式。
Comput Methods Programs Biomed. 2017 Jul;146:69-75. doi: 10.1016/j.cmpb.2017.05.008. Epub 2017 May 26.

本文引用的文献

1
Enhancer-MDLF: a novel deep learning framework for identifying cell-specific enhancers.增强型 MDLF:一种用于识别细胞特异性增强子的新型深度学习框架。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae083.
2
iEnhancer-EBLSTM: Identifying Enhancers and Strengths by Ensembles of Bidirectional Long Short-Term Memory.iEnhancer-EBLSTM:通过双向长短期记忆集成识别增强子及其强度
Front Genet. 2021 Mar 23;12:665498. doi: 10.3389/fgene.2021.665498. eCollection 2021.
3
A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information.
基于 BERT 和二维卷积神经网络的变压器架构,用于从序列信息中识别 DNA 增强子。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab005.
4
DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome.DNABERT:用于基因组中DNA语言的基于变换器的预训练双向编码器表征模型。
Bioinformatics. 2021 Aug 9;37(15):2112-2120. doi: 10.1093/bioinformatics/btab083.
5
ROC and AUC with a Binary Predictor: a Potentially Misleading Metric.二元预测指标的ROC和AUC:一个可能产生误导的指标。
J Classif. 2020 Oct;37(3):696-708. doi: 10.1007/s00357-019-09345-1. Epub 2019 Dec 23.
6
Identification and Classification of Enhancers Using Dimension Reduction Technique and Recurrent Neural Network.利用降维技术和递归神经网络鉴定和分类增强子
Comput Math Methods Med. 2020 Oct 18;2020:8852258. doi: 10.1155/2020/8852258. eCollection 2020.
7
iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor.iEnhancer-XG:基于序列的可解释增强子及其强度预测器。
Bioinformatics. 2021 May 23;37(8):1060-1067. doi: 10.1093/bioinformatics/btaa914.
8
iEnhancer-ECNN: identifying enhancers and their strength using ensembles of convolutional neural networks.iEnhancer-ECNN:使用卷积神经网络的集合来识别增强子及其强度。
BMC Genomics. 2019 Dec 24;20(Suppl 9):951. doi: 10.1186/s12864-019-6336-3.
9
ENdb: a manually curated database of experimentally supported enhancers for human and mouse.ENdb:一个经过人工策展的人类和小鼠实验支持增强子数据库。
Nucleic Acids Res. 2020 Jan 8;48(D1):D51-D57. doi: 10.1093/nar/gkz973.
10
iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding.iEnhancer-5Step:通过 Chou 的 5 步规则和词嵌入利用 DNA 序列的隐藏信息识别增强子。
Anal Biochem. 2019 Apr 15;571:53-61. doi: 10.1016/j.ab.2019.02.017. Epub 2019 Feb 26.