深度神经网络可识别预测转录因子结合的序列上下文特征。

Deep neural networks identify sequence context features predictive of transcription factor binding.

作者信息

Zheng An, Lamkin Michael, Zhao Hanqing, Wu Cynthia, Su Hao, Gymrek Melissa

机构信息

Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA USA.

Department of Bioengineering, University of California San Diego, La Jolla, CA USA.

出版信息

Nat Mach Intell. 2021 Feb;3(2):172-180. doi: 10.1038/s42256-020-00282-y. Epub 2021 Jan 18.

DOI:10.1038/s42256-020-00282-y

PMID:33796819

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8009085/

Abstract

Transcription factors (TFs) bind DNA by recognizing specific sequence motifs, typically of length 6-12bp. A motif can occur many thousands of times in the human genome, but only a subset of those sites are actually bound. Here we present a machine learning framework leveraging existing convolutional neural network architectures and model interpretation techniques to identify and interpret sequence context features most important for predicting whether a particular motif instance will be bound. We apply our framework to predict binding at motifs for 38 TFs in a lymphoblastoid cell line, score the importance of context sequences at base-pair resolution, and characterize context features most predictive of binding. We find that the choice of training data heavily influences classification accuracy and the relative importance of features such as open chromatin. Overall, our framework enables novel insights into features predictive of TF binding and is likely to inform future deep learning applications to interpret non-coding genetic variants.

摘要

转录因子（TFs）通过识别特定的序列基序来结合DNA，这些基序通常长度为6 - 12个碱基对。一个基序在人类基因组中可能出现数千次，但实际上只有一部分位点会被结合。在这里，我们提出了一个机器学习框架，利用现有的卷积神经网络架构和模型解释技术，来识别和解释对于预测特定基序实例是否会被结合最为重要的序列上下文特征。我们应用我们的框架来预测淋巴母细胞系中38种转录因子基序的结合情况，以碱基对分辨率对上下文序列的重要性进行评分，并表征最能预测结合的上下文特征。我们发现训练数据的选择对分类准确性和诸如开放染色质等特征的相对重要性有很大影响。总体而言，我们的框架能够对预测转录因子结合的特征提供新的见解，并可能为未来解释非编码基因变异的深度学习应用提供参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3874/8009085/e94e31974d48/nihms-1654340-f0005.jpg

相似文献

Deep neural networks identify sequence context features predictive of transcription factor binding.深度神经网络可识别预测转录因子结合的序列上下文特征。

Nat Mach Intell. 2021 Feb;3(2):172-180. doi: 10.1038/s42256-020-00282-y. Epub 2021 Jan 18.

Modulation of DNA-protein Interactions by Proximal Genetic Elements as Uncovered by Interpretable Deep Learning.可解释深度学习揭示的近端遗传元件对 DNA-蛋白质相互作用的调控

J Mol Biol. 2023 Jul 1;435(13):168121. doi: 10.1016/j.jmb.2023.168121. Epub 2023 Apr 24.

Base-resolution prediction of transcription factor binding signals by a deep learning framework.基于深度学习框架的转录因子结合信号的碱基分辨率预测。

PLoS Comput Biol. 2022 Mar 9;18(3):e1009941. doi: 10.1371/journal.pcbi.1009941. eCollection 2022 Mar.

distillation of thermodynamic affinity from deep learning regulatory sequence models of protein-DNA binding.从蛋白质 - DNA 结合的深度学习调控序列模型中提取热力学亲和力

bioRxiv. 2023 May 11:2023.05.11.540401. doi: 10.1101/2023.05.11.540401.

Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.序列基序、染色质状态和DNA结构特征对酵母转录因子结合预测模型的贡献

PLoS Comput Biol. 2015 Aug 20;11(8):e1004418. doi: 10.1371/journal.pcbi.1004418. eCollection 2015 Aug.

Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.利用DNA序列内在特征和细胞类型特异性染色质特征预测转录因子位点占有率。

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):4. doi: 10.1186/s12859-015-0846-z.

Joint sequence & chromatin neural networks characterize the differential abilities of Forkhead transcription factors to engage inaccessible chromatin.联合序列与染色质神经网络表征了叉头转录因子与不可及染色质结合的差异能力。

bioRxiv. 2023 Oct 31:2023.10.06.561228. doi: 10.1101/2023.10.06.561228.

High-resolution transcription factor binding sites prediction improved performance and interpretability by deep learning method.深度学习方法提高了高分辨率转录因子结合位点预测的性能和可解释性。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab273.

Information content differentiates enhancers from silencers in mouse photoreceptors.信息内容将增强子与小鼠光感受器中的沉默子区分开来。

Elife. 2021 Sep 6;10:e67403. doi: 10.7554/eLife.67403.

Discovering epistatic feature interactions from neural network models of regulatory DNA sequences.从调控 DNA 序列的神经网络模型中发现上位特征相互作用。

Bioinformatics. 2018 Sep 1;34(17):i629-i637. doi: 10.1093/bioinformatics/bty575.

引用本文的文献

Explainable AI in Genomics: Transcription Factor Binding Site Prediction with Mixture of Experts.基因组学中的可解释人工智能：基于专家混合模型的转录因子结合位点预测

ArXiv. 2025 Jul 18:arXiv:2507.09754v2.

OptimDase: An Algorithm for Predicting DNA Binding Sites with Combined Feature Encoding.OptimDase：一种采用组合特征编码预测DNA结合位点的算法。

Interdiscip Sci. 2025 Jun 10. doi: 10.1007/s12539-025-00704-8.

CREATE: cell-type-specific cis-regulatory element identification via discrete embedding.CREATE：通过离散嵌入进行细胞类型特异性顺式调控元件识别

Nat Commun. 2025 May 17;16(1):4607. doi: 10.1038/s41467-025-59780-5.

Modeling and designing enhancers by introducing and harnessing transcription factor binding units.通过引入和利用转录因子结合单元对增强子进行建模和设计。

Nat Commun. 2025 Feb 8;16(1):1469. doi: 10.1038/s41467-025-56749-2.

Advancing Regulatory Genomics With Machine Learning.利用机器学习推动监管基因组学发展。

Bioinform Biol Insights. 2024 Dec 24;18:11779322241249562. doi: 10.1177/11779322241249562. eCollection 2024.

Comprehensive analysis of computational approaches in plant transcription factors binding regions discovery.植物转录因子结合区域发现中计算方法的综合分析

Heliyon. 2024 Oct 10;10(20):e39140. doi: 10.1016/j.heliyon.2024.e39140. eCollection 2024 Oct 30.

Identifying transcription factors with cell-type specific DNA binding signatures.鉴定具有细胞类型特异性 DNA 结合特征的转录因子。

BMC Genomics. 2024 Oct 14;25(1):957. doi: 10.1186/s12864-024-10859-1.

MLSNet: a deep learning model for predicting transcription factor binding sites.MLSNet：一种用于预测转录因子结合位点的深度学习模型。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae489.

PRONTO-TK: a user-friendly PROtein Neural neTwOrk tool-kit for accessible protein function prediction.PRONTO-TK：一款用户友好型蛋白质神经网络工具包，用于便捷的蛋白质功能预测。

NAR Genom Bioinform. 2024 Aug 27;6(3):lqae112. doi: 10.1093/nargab/lqae112. eCollection 2024 Sep.

Quantum mechanical electronic and geometric parameters for DNA k-mers as features for machine learning.DNA k- -mer 的量子力学电子和几何参数作为机器学习的特征。

Sci Data. 2024 Aug 22;11(1):911. doi: 10.1038/s41597-024-03772-5.

本文引用的文献

The mutational constraint spectrum quantified from variation in 141,456 humans.从 141456 名人类个体的变异中量化的突变约束谱。

Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.

SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0：Python 中的科学计算基础算法。

Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

RUNX transcription factors: orchestrators of development.RUNX 转录因子：发育的协调者。

Development. 2019 Sep 5;146(17):dev148296. doi: 10.1242/dev.148296.

Visualizing complex feature interactions and feature sharing in genomic deep neural networks.可视化基因组深度学习神经网络中的复杂特征交互和特征共享。

BMC Bioinformatics. 2019 Jul 19;20(1):401. doi: 10.1186/s12859-019-2957-4.

The Kipoi repository accelerates community exchange and reuse of predictive models for genomics.Kipoi库加速了基因组学预测模型的社区交流与重用。

Nat Biotechnol. 2019 Jun;37(6):592-600. doi: 10.1038/s41587-019-0140-0.

IMPACT: Genomic Annotation of Cell-State-Specific Regulatory Elements Inferred from the Epigenome of Bound Transcription Factors.影响：从结合转录因子的表观基因组推断细胞状态特异性调控元件的基因组注释。

Am J Hum Genet. 2019 May 2;104(5):879-895. doi: 10.1016/j.ajhg.2019.03.012. Epub 2019 Apr 18.

FactorNet: A deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data.FactorNet：一种从核苷酸分辨率序列数据预测细胞类型特异性转录因子结合的深度学习框架。

Methods. 2019 Aug 15;166:40-47. doi: 10.1016/j.ymeth.2019.03.020. Epub 2019 Mar 26.

A novel -mer set memory (KSM) motif representation improves regulatory variant prediction.一种新型 -mer 集记忆 (KSM) 基序表示法可提高调控变异预测的准确性。

Genome Res. 2018 Jun;28(6):891-900. doi: 10.1101/gr.226852.117. Epub 2018 Apr 13.

Sequential regulatory activity prediction across chromosomes with convolutional neural networks.基于卷积神经网络的跨染色体顺序调控活性预测

Genome Res. 2018 May;28(5):739-750. doi: 10.1101/gr.227819.117. Epub 2018 Mar 27.

The Human Transcription Factors.人类转录因子。

Cell. 2018 Feb 8;172(4):650-665. doi: 10.1016/j.cell.2018.01.029.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

深度神经网络可识别预测转录因子结合的序列上下文特征。

Deep neural networks identify sequence context features predictive of transcription factor binding.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献