• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BertSNR:一种基于 DNA 语言模型的单核苷酸分辨率转录因子结合位点识别的可解释深度学习框架。

BertSNR: an interpretable deep learning framework for single-nucleotide resolution identification of transcription factor binding sites based on DNA language model.

机构信息

School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China.

School of Computer Science, University of South China, Hengyang, Hunan 421001, China.

出版信息

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae461.

DOI:10.1093/bioinformatics/btae461
PMID:39107889
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11310455/
Abstract

MOTIVATION

Transcription factors are pivotal in the regulation of gene expression, and accurate identification of transcription factor binding sites (TFBSs) at high resolution is crucial for understanding the mechanisms underlying gene regulation. The task of identifying TFBSs from DNA sequences is a significant challenge in the field of computational biology today. To address this challenge, a variety of computational approaches have been developed. However, these methods face limitations in their ability to achieve high-resolution identification and often lack interpretability.

RESULTS

We propose BertSNR, an interpretable deep learning framework for identifying TFBSs at single-nucleotide resolution. BertSNR integrates sequence-level and token-level information by multi-task learning based on pre-trained DNA language models. Benchmarking comparisons show that our BertSNR outperforms the existing state-of-the-art methods in TFBS predictions. Importantly, we enhanced the interpretability of the model through attentional weight visualization and motif analysis, and discovered the subtle relationship between attention weight and motif. Moreover, BertSNR effectively identifies TFBSs in promoter regions, facilitating the study of intricate gene regulation.

AVAILABILITY AND IMPLEMENTATION

The BertSNR source code can be found at https://github.com/lhy0322/BertSNR.

摘要

动机

转录因子在基因表达调控中起着关键作用,准确识别转录因子结合位点(TFBSs)对于理解基因调控的机制至关重要。从 DNA 序列中识别 TFBSs 是当今计算生物学领域的一个重大挑战。为了应对这一挑战,已经开发了多种计算方法。然而,这些方法在实现高分辨率识别方面存在局限性,并且往往缺乏可解释性。

结果

我们提出了 BertSNR,这是一种用于单核苷酸分辨率识别 TFBS 的可解释深度学习框架。BertSNR 通过基于预训练 DNA 语言模型的多任务学习整合了序列级和标记级信息。基准比较表明,我们的 BertSNR 在 TFBS 预测方面优于现有的最先进方法。重要的是,我们通过注意力权重可视化和基序分析增强了模型的可解释性,并发现了注意力权重和基序之间的微妙关系。此外,BertSNR 有效地识别了启动子区域中的 TFBS,有助于研究复杂的基因调控。

可用性和实现

BertSNR 的源代码可以在 https://github.com/lhy0322/BertSNR 找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/c49577285ce4/btae461f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/bd58df6653fd/btae461f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/91cbe68ead02/btae461f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/65054e66b8de/btae461f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/dbb8ccfb9e0a/btae461f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/c49577285ce4/btae461f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/bd58df6653fd/btae461f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/91cbe68ead02/btae461f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/65054e66b8de/btae461f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/dbb8ccfb9e0a/btae461f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/11310455/c49577285ce4/btae461f4.jpg

相似文献

1
BertSNR: an interpretable deep learning framework for single-nucleotide resolution identification of transcription factor binding sites based on DNA language model.BertSNR:一种基于 DNA 语言模型的单核苷酸分辨率转录因子结合位点识别的可解释深度学习框架。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae461.
2
BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning.BERT-TFBS:一种基于迁移学习的用于预测转录因子结合位点的新型基于BERT的模型。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae195.
3
MLSNet: a deep learning model for predicting transcription factor binding sites.MLSNet:一种用于预测转录因子结合位点的深度学习模型。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae489.
4
Base-pair resolution detection of transcription factor binding site by deep deconvolutional network.基于深度卷积神经网络的转录因子结合位点碱基对分辨率检测。
Bioinformatics. 2018 Oct 15;34(20):3446-3453. doi: 10.1093/bioinformatics/bty383.
5
High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method.深度学习方法提高了高分辨率转录因子结合位点预测的性能和可解释性。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab273.
6
A novel convolution attention model for predicting transcription factor binding sites by combination of sequence and shape.一种新型卷积注意力模型,通过序列和形状的结合来预测转录因子结合位点。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab525.
7
An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs.一种基于直觉的方法,用于对 DNA 序列进行评分,以对抗转录因子结合位点基序。
BMC Bioinformatics. 2010 Nov 8;11:551. doi: 10.1186/1471-2105-11-551.
8
Molecular and structural considerations of TF-DNA binding for the generation of biologically meaningful and accurate phylogenetic footprinting analysis: the LysR-type transcriptional regulator family as a study model.用于生成具有生物学意义和准确的系统发育足迹分析的TF-DNA结合的分子和结构考量:以LysR型转录调节因子家族作为研究模型
BMC Genomics. 2016 Aug 27;17(1):686. doi: 10.1186/s12864-016-3025-3.
9
GraphPro: An interpretable graph neural network-based model for identifying promoters in multiple species.GraphPro:一种基于可解释图神经网络的模型,用于识别多个物种中的启动子。
Comput Biol Med. 2024 Sep;180:108974. doi: 10.1016/j.compbiomed.2024.108974. Epub 2024 Aug 2.
10
DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.DeepD2V:一种基于深度学习的新型框架,用于从组合 DNA 序列预测转录因子结合位点。
Int J Mol Sci. 2021 May 24;22(11):5521. doi: 10.3390/ijms22115521.

本文引用的文献

1
Applications of transformer-based language models in bioinformatics: a survey.基于Transformer的语言模型在生物信息学中的应用:一项综述。
Bioinform Adv. 2023 Jan 11;3(1):vbad001. doi: 10.1093/bioadv/vbad001. eCollection 2023.
2
The UCSC Genome Browser database: 2023 update.UCSC 基因组浏览器数据库:2023 年更新。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1188-D1195. doi: 10.1093/nar/gkac1072.
3
Improving language model of human genome for DNA-protein binding prediction based on task-specific pre-training.基于特定任务预训练改进用于DNA-蛋白质结合预测的人类基因组语言模型。
Interdiscip Sci. 2023 Mar;15(1):32-43. doi: 10.1007/s12539-022-00537-9. Epub 2022 Sep 22.
4
A novel convolution attention model for predicting transcription factor binding sites by combination of sequence and shape.一种新型卷积注意力模型,通过序列和形状的结合来预测转录因子结合位点。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab525.
5
JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles.JASPAR 2022:转录因子结合谱开放获取数据库的第 9 个版本。
Nucleic Acids Res. 2022 Jan 7;50(D1):D165-D173. doi: 10.1093/nar/gkab1113.
6
Effective gene expression prediction from sequence by integrating long-range interactions.通过整合长程相互作用,从序列中有效预测基因表达。
Nat Methods. 2021 Oct;18(10):1196-1203. doi: 10.1038/s41592-021-01252-x. Epub 2021 Oct 4.
7
High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method.深度学习方法提高了高分辨率转录因子结合位点预测的性能和可解释性。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab273.
8
SAResNet: self-attention residual network for predicting DNA-protein binding.SAResNet:用于预测 DNA-蛋白质结合的自注意力残差网络。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab101.
9
DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome.DNABERT:用于基因组中DNA语言的基于变换器的预训练双向编码器表征模型。
Bioinformatics. 2021 Aug 9;37(15):2112-2120. doi: 10.1093/bioinformatics/btab083.
10
A survey on deep learning in DNA/RNA motif mining.深度学习在 DNA/RNA 基序挖掘中的应用调查。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa229.