Suppr超能文献

深度基序仪表盘:使用深度神经网络可视化和理解基因组序列

DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

作者信息

Lanchantin Jack, Singh Ritambhara, Wang Beilun, Qi Yanjun

机构信息

Department of Computer Science, University of Virginia, Charlottesville, VA 22903, USA,

出版信息

Pac Symp Biocomput. 2017;22:254-265. doi: 10.1142/9789813207813_0025.

Abstract

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

摘要

深度神经网络(DNN)模型最近在转录因子结合(TFBS)位点分类任务中取得了最先进的预测准确率。然而,目前尚不清楚这些方法如何识别有意义的DNA序列信号,也不清楚它们为何能深入了解转录因子与特定位置的结合原因。在本文中,我们提出了一个名为深度基序仪表盘(DeMo仪表盘)的工具包,它提供了一套可视化策略,用于从用于TFBS分类的深度神经网络模型中提取基序或序列模式。我们展示了如何可视化和理解三种重要的DNN模型:卷积网络、循环网络和卷积循环网络。我们的第一种可视化方法是找到测试序列的显著性图,该图使用一阶导数来描述每个核苷酸在进行最终预测时的重要性。其次,考虑到循环模型以时间顺序进行预测(从TFBS序列的一端到另一端),我们引入了时间输出分数,它表示模型对序列输入随时间的预测分数。最后,一种特定类别的可视化策略通过随机梯度优化找到给定TFBS正类的最优输入序列。我们的实验结果表明,在这三种架构中,卷积循环架构的性能最佳。可视化技术表明,CNN-RNN通过对基序及其之间的依赖性进行建模来进行预测。

相似文献

2
Predicting enhancers with deep convolutional neural networks.使用深度卷积神经网络预测增强子。
BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):478. doi: 10.1186/s12859-017-1878-3.
3
6
Prediction of TF-Binding Site by Inclusion of Higher Order Position Dependencies.通过包含更高阶位置相关性来预测 TF 结合位点。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1383-1393. doi: 10.1109/TCBB.2019.2892124. Epub 2019 Jan 10.
7
High-Order Convolutional Neural Network Architecture for Predicting DNA-Protein Binding Sites.用于预测 DNA-蛋白质结合位点的高阶卷积神经网络架构。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1184-1192. doi: 10.1109/TCBB.2018.2819660. Epub 2018 Mar 26.

引用本文的文献

3
Advancing Regulatory Genomics With Machine Learning.利用机器学习推动监管基因组学发展。
Bioinform Biol Insights. 2024 Dec 24;18:11779322241249562. doi: 10.1177/11779322241249562. eCollection 2024.

本文引用的文献

9
Enhanced regulatory sequence prediction using gapped k-mer features.使用带缺口的 k-mer 特征增强调控序列预测。
PLoS Comput Biol. 2014 Jul 17;10(7):e1003711. doi: 10.1371/journal.pcbi.1003711. eCollection 2014 Jul.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验