• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从蛋白质 - DNA 结合的深度学习调控序列模型中提取热力学亲和力

distillation of thermodynamic affinity from deep learning regulatory sequence models of protein-DNA binding.

作者信息

Alexandari Amr M, Horton Connor A, Shrikumar Avanti, Shah Nilay, Li Eileen, Weilert Melanie, Pufall Miles A, Zeitlinger Julia, Fordyce Polly M, Kundaje Anshul

机构信息

Department of Computer Science, Stanford University, Stanford, CA 94305.

Department of Genetics, Stanford University, Stanford, CA 94305.

出版信息

bioRxiv. 2023 May 11:2023.05.11.540401. doi: 10.1101/2023.05.11.540401.

DOI:10.1101/2023.05.11.540401
PMID:37214836
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10197627/
Abstract

Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, binding profiles. Conversely, deep learning models, trained on TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of and TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of binding, suggest that deep learning models of binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput experiments to explore the influence of sequence context and variation on both intrinsic affinity and occupancy.

摘要

转录因子(TF)是一类蛋白质,它们以序列特异性的方式结合DNA以调控基因转录。尽管转录因子具有独特的内在序列偏好,但在不同的细胞环境中,其基因组占据图谱存在差异。因此,解析转录因子结合的序列决定因素,包括内在因素和特定环境因素,对于理解基因调控以及调控性非编码遗传变异的影响至关重要。基于转录因子结合实验训练的生物物理模型可以估计内在亲和力图谱,并根据转录因子浓度和亲和力预测占据情况。然而,这些模型无法充分解释特定环境下的结合图谱。相反,基于转录因子结合实验训练的深度学习模型能够有效地将基因组占据图谱预测并解释为复杂调控序列语法的函数,尽管缺乏清晰的生物物理解释。为了协调这些互补的转录因子结合模型,我们开发了亲和力蒸馏(AD)方法,该方法通过消除基因组序列背景的影响,从转录因子染色质免疫沉淀(ChIP)实验的深度学习模型中提取热力学亲和力。将AD应用于对不同类别的酵母和哺乳动物转录因子进行建模的神经网络,与基于基序的方法相比,AD能够通过具有更高动态范围和准确性的各种实验,预测基序内部和周围序列变异对转录因子结合的能量影响。此外,AD能够准确辨别转录因子旁系同源物的亲和力。我们的结果强调了热力学亲和力是结合的关键决定因素,表明结合的深度学习模型隐含地学习了高分辨率的亲和力图谱,并表明这些亲和力可以通过AD成功蒸馏出来。对深度学习模型的这种新的生物物理解释使得高通量实验能够探索序列背景和变异对内在亲和力和占据情况的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/252b7f422edd/nihpp-2023.05.11.540401v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/292125426b37/nihpp-2023.05.11.540401v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/0eabf2088fc3/nihpp-2023.05.11.540401v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/64e673c70392/nihpp-2023.05.11.540401v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/b4edee9ecdfc/nihpp-2023.05.11.540401v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/3c7f4a7069fb/nihpp-2023.05.11.540401v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/252b7f422edd/nihpp-2023.05.11.540401v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/292125426b37/nihpp-2023.05.11.540401v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/0eabf2088fc3/nihpp-2023.05.11.540401v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/64e673c70392/nihpp-2023.05.11.540401v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/b4edee9ecdfc/nihpp-2023.05.11.540401v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/3c7f4a7069fb/nihpp-2023.05.11.540401v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/080f/10197627/252b7f422edd/nihpp-2023.05.11.540401v1-f0006.jpg

相似文献

1
distillation of thermodynamic affinity from deep learning regulatory sequence models of protein-DNA binding.从蛋白质 - DNA 结合的深度学习调控序列模型中提取热力学亲和力
bioRxiv. 2023 May 11:2023.05.11.540401. doi: 10.1101/2023.05.11.540401.
2
High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions.转录因子-DNA 亲和力的高分辨率模型可改善体外和体内结合预测。
PLoS Comput Biol. 2010 Sep 9;6(9):e1000916. doi: 10.1371/journal.pcbi.1000916.
3
Competition for DNA binding between paralogous transcription factors determines their genomic occupancy and regulatory functions.同源转录因子之间对DNA结合的竞争决定了它们的基因组占据情况和调控功能。
Genome Res. 2021 Jul;31(7):1216-1229. doi: 10.1101/gr.275145.120. Epub 2021 May 11.
4
Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.序列基序、染色质状态和DNA结构特征对酵母转录因子结合预测模型的贡献
PLoS Comput Biol. 2015 Aug 20;11(8):e1004418. doi: 10.1371/journal.pcbi.1004418. eCollection 2015 Aug.
5
A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data.基于全基因组结合数据的转录因子相互作用和结合位点排列的生物物理模型分析。
PLoS One. 2009 Dec 1;4(12):e8155. doi: 10.1371/journal.pone.0008155.
6
Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans.转录因子结合 k- -mer 分析阐明了人类结合特异性和顺式调控 SNP 的细胞类型依赖性。
BMC Genomics. 2023 Oct 7;24(1):597. doi: 10.1186/s12864-023-09692-9.
7
Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding.全面、高分辨率的结合能图谱揭示了转录因子结合的上下文依赖性。
Proc Natl Acad Sci U S A. 2018 Apr 17;115(16):E3702-E3711. doi: 10.1073/pnas.1715888115. Epub 2018 Mar 27.
8
Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.利用DNA序列内在特征和细胞类型特异性染色质特征预测转录因子位点占有率。
BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):4. doi: 10.1186/s12859-015-0846-z.
9
Benchmarking DNA binding affinity models using allele-specific transcription factor binding data.使用等位基因特异性转录因子结合数据对DNA结合亲和力模型进行基准测试。
bioRxiv. 2023 Dec 15:2023.12.15.571887. doi: 10.1101/2023.12.15.571887.
10
The Role of Genome Accessibility in Transcription Factor Binding in Bacteria.基因组可及性在细菌转录因子结合中的作用
PLoS Comput Biol. 2016 Apr 22;12(4):e1004891. doi: 10.1371/journal.pcbi.1004891. eCollection 2016 Apr.

本文引用的文献

1
Short tandem repeats bind transcription factors to tune eukaryotic gene expression.短串联重复序列结合转录因子来调节真核生物基因表达。
Science. 2023 Sep 22;381(6664):eadd1250. doi: 10.1126/science.add1250.
2
CUT&RUN Profiling of the Budding Yeast Epigenome.CUT&RUN 技术在 budding yeast 表观基因组研究中的应用
Methods Mol Biol. 2022;2477:129-147. doi: 10.1007/978-1-0716-2257-5_9.
3
MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect.MAVE-NN:从变异效应的多重分析中学习基因型-表型图谱。
Genome Biol. 2022 Apr 15;23(1):98. doi: 10.1186/s13059-022-02661-7.
4
JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles.JASPAR 2022:转录因子结合谱开放获取数据库的第 9 个版本。
Nucleic Acids Res. 2022 Jan 7;50(D1):D165-D173. doi: 10.1093/nar/gkab1113.
5
Substitutions at a rheostat position in human aldolase A cause a shift in the conformational population.在人醛缩酶 A 的变阻器位置进行取代会导致构象群体的转移。
Protein Sci. 2022 Feb;31(2):357-370. doi: 10.1002/pro.4222. Epub 2021 Nov 12.
6
Effective gene expression prediction from sequence by integrating long-range interactions.通过整合长程相互作用,从序列中有效预测基因表达。
Nat Methods. 2021 Oct;18(10):1196-1203. doi: 10.1038/s41592-021-01252-x. Epub 2021 Oct 4.
7
Marginal Contribution Feature Importance - an Axiomatic Approach for Explaining Data.边际贡献特征重要性——一种解释数据的公理方法。
Proc Mach Learn Res. 2021 Jul;139:1324-1335.
8
Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks.全局重要性分析:一种用于量化深度神经网络中基因组特征重要性的可解释性方法。
PLoS Comput Biol. 2021 May 13;17(5):e1008925. doi: 10.1371/journal.pcbi.1008925. eCollection 2021 May.
9
Competition for DNA binding between paralogous transcription factors determines their genomic occupancy and regulatory functions.同源转录因子之间对DNA结合的竞争决定了它们的基因组占据情况和调控功能。
Genome Res. 2021 Jul;31(7):1216-1229. doi: 10.1101/gr.275145.120. Epub 2021 May 11.
10
Genome-wide binding potential and regulatory activity of the glucocorticoid receptor's monomeric and dimeric forms.糖皮质激素受体单体和二聚体形式的全基因组结合潜力及调控活性。
Nat Commun. 2021 Mar 31;12(1):1987. doi: 10.1038/s41467-021-22234-9.