通过整合计算模型中的实验数据来学习蛋白质-DNA 相互作用图谱。

Learning protein-DNA interaction landscapes by integrating experimental data through computational models.

机构信息

Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA.

出版信息

Bioinformatics. 2014 Oct 15;30(20):2868-74. doi: 10.1093/bioinformatics/btu408. Epub 2014 Jun 27.

DOI:10.1093/bioinformatics/btu408

PMID:24974204

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4184258/

Abstract

MOTIVATION

Transcriptional regulation is directly enacted by the interactions between DNA and many proteins, including transcription factors (TFs), nucleosomes and polymerases. A critical step in deciphering transcriptional regulation is to infer, and eventually predict, the precise locations of these interactions, along with their strength and frequency. While recent datasets yield great insight into these interactions, individual data sources often provide only partial information regarding one aspect of the complete interaction landscape. For example, chromatin immunoprecipitation (ChIP) reveals the binding positions of a protein, but only for one protein at a time. In contrast, nucleases like MNase and DNase can be used to reveal binding positions for many different proteins at once, but cannot easily determine the identities of those proteins. Currently, few statistical frameworks jointly model these different data sources to reveal an accurate, holistic view of the in vivo protein-DNA interaction landscape.

RESULTS

Here, we develop a novel statistical framework that integrates different sources of experimental information within a thermodynamic model of competitive binding to jointly learn a holistic view of the in vivo protein-DNA interaction landscape. We show that our framework learns an interaction landscape with increased accuracy, explaining multiple sets of data in accordance with thermodynamic principles of competitive DNA binding. The resulting model of genomic occupancy provides a precise mechanistic vantage point from which to explore the role of protein-DNA interactions in transcriptional regulation.

AVAILABILITY AND IMPLEMENTATION

The C source code for compete and Python source code for MCMC-based inference are available at http://www.cs.duke.edu/∼amink.

CONTACT

amink@cs.duke.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

转录调控是由 DNA 与许多蛋白质（包括转录因子 (TFs)、核小体和聚合酶）之间的相互作用直接实施的。破译转录调控的关键步骤是推断（并最终预测）这些相互作用的确切位置，以及它们的强度和频率。虽然最近的数据集为这些相互作用提供了很好的见解，但单个数据源通常仅提供有关完整相互作用景观的一个方面的部分信息。例如，染色质免疫沉淀 (ChIP) 揭示了蛋白质的结合位置，但一次只能揭示一种蛋白质的位置。相比之下，核酸酶如 MNase 和 DNase 可用于同时揭示许多不同蛋白质的结合位置，但不能轻易确定这些蛋白质的身份。目前，很少有统计框架联合这些不同的数据源来揭示体内蛋白质-DNA 相互作用景观的准确、整体视图。

结果

在这里，我们开发了一种新的统计框架，该框架在竞争结合的热力学模型内整合了不同来源的实验信息，以共同学习体内蛋白质-DNA 相互作用景观的整体视图。我们表明，我们的框架以更高的准确性学习相互作用景观，根据竞争 DNA 结合的热力学原理解释多组数据。由此产生的基因组占有率模型提供了一个精确的机械视角，可从中探索蛋白质-DNA 相互作用在转录调控中的作用。

可用性和实现

compete 的 C 源代码和基于 MCMC 推断的 Python 源代码可在 http://www.cs.duke.edu/∼amink 获得。

联系人

amink@cs.duke.edu

补充信息

补充数据可在生物信息学在线获得。

相似文献

Learning protein-DNA interaction landscapes by integrating experimental data through computational models.通过整合计算模型中的实验数据来学习蛋白质-DNA 相互作用图谱。

Bioinformatics. 2014 Oct 15;30(20):2868-74. doi: 10.1093/bioinformatics/btu408. Epub 2014 Jun 27.

Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning.通过多实例学习对体内和体外蛋白质-DNA 相互作用进行计算建模。

Bioinformatics. 2017 Jul 15;33(14):2097-2105. doi: 10.1093/bioinformatics/btx115.

RoboCOP: jointly computing chromatin occupancy profiles for numerous factors from chromatin accessibility data.RoboCOP：从染色质可及性数据中联合计算多个因子的染色质占有率图谱。

Nucleic Acids Res. 2021 Aug 20;49(14):7925-7938. doi: 10.1093/nar/gkab553.

Quantitative models of the mechanisms that control genome-wide patterns of animal transcription factor binding.控制动物转录因子全基因组结合模式的机制的定量模型。

Methods Cell Biol. 2012;110:263-83. doi: 10.1016/B978-0-12-388403-9.00011-4.

Mapping nucleosome positions using DNase-seq.利用DNA酶测序法绘制核小体位置图谱。

Genome Res. 2016 Mar;26(3):351-64. doi: 10.1101/gr.195602.115. Epub 2016 Jan 15.

BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.BinDNase：一种利用DNA酶I超敏反应数据进行转录因子结合预测的鉴别方法。

Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.

Distinguishing direct versus indirect transcription factor-DNA interactions.区分直接与间接转录因子-DNA 相互作用。

Genome Res. 2009 Nov;19(11):2090-100. doi: 10.1101/gr.094144.109. Epub 2009 Aug 3.

Efficient inference for sparse latent variable models of transcriptional regulation.转录调控稀疏潜在变量模型的高效推断。

Bioinformatics. 2017 Dec 1;33(23):3776-3783. doi: 10.1093/bioinformatics/btx508.

Nonconsensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes.与重复DNA序列元件的非一致性蛋白质结合显著影响真核生物基因组。

PLoS Comput Biol. 2015 Aug 18;11(8):e1004429. doi: 10.1371/journal.pcbi.1004429. eCollection 2015 Aug.

SignalSpider: probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles.SignalSpider：基于多个标准化ChIP-Seq信号图谱的概率模式发现

Bioinformatics. 2015 Jan 1;31(1):17-24. doi: 10.1093/bioinformatics/btu604. Epub 2014 Sep 5.

引用本文的文献

Inferring differential protein binding from time-series chromatin accessibility data.从时间序列染色质可及性数据推断差异蛋白结合

Bioinform Adv. 2025 Apr 10;5(1):vbaf080. doi: 10.1093/bioadv/vbaf080. eCollection 2025.

Profiling the quantitative occupancy of myriad transcription factors across conditions by modeling chromatin accessibility data.通过建模染色质可及性数据来描绘众多转录因子在不同条件下的定量占据情况。

Genome Res. 2022 Jun;32(6):1183-1198. doi: 10.1101/gr.272203.120. Epub 2022 May 24.

RoboCOP: Multivariate State Space Model Integrating Epigenomic Accessibility Data to Elucidate Genome-Wide Chromatin Occupancy.RoboCOP：整合表观基因组可及性数据以阐明全基因组染色质占据情况的多变量状态空间模型

Res Comput Mol Biol. 2020 May;12074:136-151. doi: 10.1007/978-3-030-45257-5_9. Epub 2020 Apr 21.

Mapping nucleosome positions using DNase-seq.利用DNA酶测序法绘制核小体位置图谱。

Genome Res. 2016 Mar;26(3):351-64. doi: 10.1101/gr.195602.115. Epub 2016 Jan 15.

Implications of Big Data for cell biology.大数据对细胞生物学的影响。

Mol Biol Cell. 2015 Jul 15;26(14):2575-8. doi: 10.1091/mbc.E13-12-0756.

Protein-DNA binding in high-resolution.高分辨率下的蛋白质 - DNA 结合

Crit Rev Biochem Mol Biol. 2015;50(4):269-83. doi: 10.3109/10409238.2015.1051505. Epub 2015 Jun 3.

本文引用的文献

Using DNase digestion data to accurately identify transcription factor binding sites.利用脱氧核糖核酸酶消化数据准确识别转录因子结合位点。

Pac Symp Biocomput. 2013:80-91.

Evaluation of methods for modeling transcription factor sequence specificity.转录因子序列特异性建模方法评估。

Nat Biotechnol. 2013 Feb;31(2):126-34. doi: 10.1038/nbt.2486. Epub 2013 Jan 27.

Genome-wide protein-DNA binding dynamics suggest a molecular clutch for transcription factor function.全基因组范围内的蛋白-DNA 结合动力学提示转录因子功能的分子离合器。

Nature. 2012 Apr 11;484(7393):251-5. doi: 10.1038/nature10985.

Genome-wide structure and organization of eukaryotic pre-initiation complexes.真核生物起始前复合物的全基因组结构和组织。

Nature. 2012 Jan 18;483(7389):295-301. doi: 10.1038/nature10799.

Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights.酵母转录因子 DNA 结合特异性数据的精选集合揭示了新的结构和基因调控见解。

Genome Biol. 2011 Dec 21;12(12):R125. doi: 10.1186/gb-2011-12-12-r125.

Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution.在单核苷酸分辨率下检测到全基因组范围内的蛋白质-DNA 相互作用。

Cell. 2011 Dec 9;147(6):1408-19. doi: 10.1016/j.cell.2011.11.013.

Epigenome characterization at single base-pair resolution.单碱基分辨率的表观基因组特征。

Proc Natl Acad Sci U S A. 2011 Nov 8;108(45):18318-23. doi: 10.1073/pnas.1110731108. Epub 2011 Oct 24.

Animal transcription networks as highly connected, quantitative continua.动物转录网络作为高度连接的定量连续体。

Dev Cell. 2011 Oct 18;21(4):611-26. doi: 10.1016/j.devcel.2011.09.008.

Calculating transcription factor binding maps for chromatin.计算染色质转录因子结合图谱。

Brief Bioinform. 2012 Mar;13(2):187-201. doi: 10.1093/bib/bbr037. Epub 2011 Jul 6.

Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development.在早期果蝇发育过程中控制转录因子结合全基因组模式的机制的定量模型。

PLoS Genet. 2011 Feb 3;7(2):e1001290. doi: 10.1371/journal.pgen.1001290.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验