Suppr超能文献

利用注意力机制提高转录因子结合位点预测的可解释性。

Enhancing the interpretability of transcription factor binding site prediction using attention mechanism.

机构信息

Department of Computer Science and Engineering, Korea University, Seoul, South Korea.

Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul, South Korea.

出版信息

Sci Rep. 2020 Aug 7;10(1):13413. doi: 10.1038/s41598-020-70218-4.

Abstract

Transcription factors (TFs) regulate the gene expression of their target genes by binding to the regulatory sequences of target genes (e.g., promoters and enhancers). To fully understand gene regulatory mechanisms, it is crucial to decipher the relationships between TFs and DNA sequences. Moreover, studies such as GWAS and eQTL have verified that most disease-related variants exist in non-coding regions, and highlighted the necessity to identify such variants that cause diseases by interrupting TF binding mechanisms. To do this, it is necessary to build a prediction model that precisely predicts the binding relationships between TFs and DNA sequences. Recently, deep learning based models have been proposed and have shown competitive results on a transcription factor binding site prediction task. However, it is difficult to interpret the prediction results obtained from the previous models. In addition, the previous models assumed all the sequence regions in the input DNA sequence have the same importance for predicting TF-binding, although sequence regions containing TF-binding-associated signals such as TF-binding motifs should be captured more than other regions. To address these challenges, we propose TBiNet, an attention based interpretable deep neural network for predicting transcription factor binding sites. Using the attention mechanism, our method is able to assign more importance on the actual TF binding sites in the input DNA sequence. TBiNet outperforms the current state-of-the-art methods (DeepSea and DanQ) quantitatively in the TF-DNA binding prediction task. Moreover, TBiNet is more effective than the previous models in discovering known TF-binding motifs.

摘要

转录因子 (TFs) 通过与靶基因的调控序列 (如启动子和增强子) 结合来调节靶基因的表达。为了充分了解基因调控机制,解析 TFs 与 DNA 序列之间的关系至关重要。此外,GWAS 和 eQTL 等研究已经验证,大多数与疾病相关的变异存在于非编码区域,并强调了识别通过中断 TF 结合机制引起疾病的此类变异的必要性。为此,有必要构建一个能够准确预测 TFs 与 DNA 序列之间结合关系的预测模型。最近,已经提出了基于深度学习的模型,并且在转录因子结合位点预测任务中表现出了有竞争力的结果。然而,从之前的模型中获得的预测结果很难进行解释。此外,之前的模型假设输入 DNA 序列中的所有序列区域对于预测 TF 结合都具有相同的重要性,尽管包含 TF 结合相关信号(如 TF 结合基序)的序列区域应该比其他区域更能被捕获。为了解决这些挑战,我们提出了 TBiNet,这是一种用于预测转录因子结合位点的基于注意力的可解释深度神经网络。通过使用注意力机制,我们的方法能够在输入 DNA 序列中的实际 TF 结合位点上分配更多的重要性。在 TF-DNA 结合预测任务中,TBinet 在定量上优于当前最先进的方法 (DeepSea 和 DanQ)。此外,TBinet 在发现已知 TF 结合基序方面比以前的模型更有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4d9/7414127/ea0d1575955a/41598_2020_70218_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验