Suppr超能文献

BERT-TFBS:一种基于迁移学习的用于预测转录因子结合位点的新型基于BERT的模型。

BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning.

作者信息

Wang Kai, Zeng Xuan, Zhou Jingwen, Liu Fei, Luan Xiaoli, Wang Xinglong

机构信息

Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China.

Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China.

出版信息

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae195.

Abstract

Transcription factors (TFs) are proteins essential for regulating genetic transcriptions by binding to transcription factor binding sites (TFBSs) in DNA sequences. Accurate predictions of TFBSs can contribute to the design and construction of metabolic regulatory systems based on TFs. Although various deep-learning algorithms have been developed for predicting TFBSs, the prediction performance needs to be improved. This paper proposes a bidirectional encoder representations from transformers (BERT)-based model, called BERT-TFBS, to predict TFBSs solely based on DNA sequences. The model consists of a pre-trained BERT module (DNABERT-2), a convolutional neural network (CNN) module, a convolutional block attention module (CBAM) and an output module. The BERT-TFBS model utilizes the pre-trained DNABERT-2 module to acquire the complex long-term dependencies in DNA sequences through a transfer learning approach, and applies the CNN module and the CBAM to extract high-order local features. The proposed model is trained and tested based on 165 ENCODE ChIP-seq datasets. We conducted experiments with model variants, cross-cell-line validations and comparisons with other models. The experimental results demonstrate the effectiveness and generalization capability of BERT-TFBS in predicting TFBSs, and they show that the proposed model outperforms other deep-learning models. The source code for BERT-TFBS is available at https://github.com/ZX1998-12/BERT-TFBS.

摘要

转录因子(TFs)是通过与DNA序列中的转录因子结合位点(TFBSs)结合来调节基因转录所必需的蛋白质。准确预测TFBSs有助于基于转录因子的代谢调控系统的设计与构建。尽管已经开发了各种深度学习算法来预测TFBSs,但预测性能仍有待提高。本文提出了一种基于变换器双向编码器表征(BERT)的模型,称为BERT-TFBS,用于仅基于DNA序列预测TFBSs。该模型由一个预训练的BERT模块(DNABERT-2)、一个卷积神经网络(CNN)模块、一个卷积块注意力模块(CBAM)和一个输出模块组成。BERT-TFBS模型利用预训练的DNABERT-2模块,通过迁移学习方法获取DNA序列中复杂的长期依赖性,并应用CNN模块和CBAM提取高阶局部特征。所提出的模型基于165个ENCODE ChIP-seq数据集进行训练和测试。我们对模型变体进行了实验、跨细胞系验证并与其他模型进行了比较。实验结果证明了BERT-TFBS在预测TFBSs方面的有效性和泛化能力,并且表明所提出的模型优于其他深度学习模型。BERT-TFBS的源代码可在https://github.com/ZX1998-12/BERT-TFBS获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb6f/11066948/3ee19d4791b0/bbae195f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验