BertSNR：一种基于 DNA 语言模型的单核苷酸分辨率转录因子结合位点识别的可解释深度学习框架。

BertSNR: an interpretable deep learning framework for single-nucleotide resolution identification of transcription factor binding sites based on DNA language model.

机构信息

School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China.

School of Computer Science, University of South China, Hengyang, Hunan 421001, China.

出版信息

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae461.

DOI:10.1093/bioinformatics/btae461

PMID:39107889

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11310455/

Abstract

MOTIVATION

Transcription factors are pivotal in the regulation of gene expression, and accurate identification of transcription factor binding sites (TFBSs) at high resolution is crucial for understanding the mechanisms underlying gene regulation. The task of identifying TFBSs from DNA sequences is a significant challenge in the field of computational biology today. To address this challenge, a variety of computational approaches have been developed. However, these methods face limitations in their ability to achieve high-resolution identification and often lack interpretability.

RESULTS

We propose BertSNR, an interpretable deep learning framework for identifying TFBSs at single-nucleotide resolution. BertSNR integrates sequence-level and token-level information by multi-task learning based on pre-trained DNA language models. Benchmarking comparisons show that our BertSNR outperforms the existing state-of-the-art methods in TFBS predictions. Importantly, we enhanced the interpretability of the model through attentional weight visualization and motif analysis, and discovered the subtle relationship between attention weight and motif. Moreover, BertSNR effectively identifies TFBSs in promoter regions, facilitating the study of intricate gene regulation.

AVAILABILITY AND IMPLEMENTATION

The BertSNR source code can be found at https://github.com/lhy0322/BertSNR.

摘要

动机

转录因子在基因表达调控中起着关键作用，准确识别转录因子结合位点（TFBSs）对于理解基因调控的机制至关重要。从 DNA 序列中识别 TFBSs 是当今计算生物学领域的一个重大挑战。为了应对这一挑战，已经开发了多种计算方法。然而，这些方法在实现高分辨率识别方面存在局限性，并且往往缺乏可解释性。

结果

我们提出了 BertSNR，这是一种用于单核苷酸分辨率识别 TFBS 的可解释深度学习框架。BertSNR 通过基于预训练 DNA 语言模型的多任务学习整合了序列级和标记级信息。基准比较表明，我们的 BertSNR 在 TFBS 预测方面优于现有的最先进方法。重要的是，我们通过注意力权重可视化和基序分析增强了模型的可解释性，并发现了注意力权重和基序之间的微妙关系。此外，BertSNR 有效地识别了启动子区域中的 TFBS，有助于研究复杂的基因调控。

可用性和实现

BertSNR 的源代码可以在 https://github.com/lhy0322/BertSNR 找到。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

BertSNR：一种基于 DNA 语言模型的单核苷酸分辨率转录因子结合位点识别的可解释深度学习框架。

BertSNR: an interpretable deep learning framework for single-nucleotide resolution identification of transcription factor binding sites based on DNA language model.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

本文引用的文献

BertSNR：一种基于 DNA 语言模型的单核苷酸分辨率转录因子结合位点识别的可解释深度学习框架。

BertSNR: an interpretable deep learning framework for single-nucleotide resolution identification of transcription factor binding sites based on DNA language model.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

本文引用的文献