Suppr超能文献

通过整合计算模型中的实验数据来学习蛋白质-DNA 相互作用图谱。

Learning protein-DNA interaction landscapes by integrating experimental data through computational models.

机构信息

Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA.

Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, Knowledge Systems and Informatics, Lawrence Livermore National Laboratory, Livermore, CA 94550 and Department of Computer Science, Duke University, Durham, NC 27708, USA.

出版信息

Bioinformatics. 2014 Oct 15;30(20):2868-74. doi: 10.1093/bioinformatics/btu408. Epub 2014 Jun 27.

Abstract

MOTIVATION

Transcriptional regulation is directly enacted by the interactions between DNA and many proteins, including transcription factors (TFs), nucleosomes and polymerases. A critical step in deciphering transcriptional regulation is to infer, and eventually predict, the precise locations of these interactions, along with their strength and frequency. While recent datasets yield great insight into these interactions, individual data sources often provide only partial information regarding one aspect of the complete interaction landscape. For example, chromatin immunoprecipitation (ChIP) reveals the binding positions of a protein, but only for one protein at a time. In contrast, nucleases like MNase and DNase can be used to reveal binding positions for many different proteins at once, but cannot easily determine the identities of those proteins. Currently, few statistical frameworks jointly model these different data sources to reveal an accurate, holistic view of the in vivo protein-DNA interaction landscape.

RESULTS

Here, we develop a novel statistical framework that integrates different sources of experimental information within a thermodynamic model of competitive binding to jointly learn a holistic view of the in vivo protein-DNA interaction landscape. We show that our framework learns an interaction landscape with increased accuracy, explaining multiple sets of data in accordance with thermodynamic principles of competitive DNA binding. The resulting model of genomic occupancy provides a precise mechanistic vantage point from which to explore the role of protein-DNA interactions in transcriptional regulation.

AVAILABILITY AND IMPLEMENTATION

The C source code for compete and Python source code for MCMC-based inference are available at http://www.cs.duke.edu/∼amink.

CONTACT

amink@cs.duke.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

转录调控是由 DNA 与许多蛋白质(包括转录因子 (TFs)、核小体和聚合酶)之间的相互作用直接实施的。破译转录调控的关键步骤是推断(并最终预测)这些相互作用的确切位置,以及它们的强度和频率。虽然最近的数据集为这些相互作用提供了很好的见解,但单个数据源通常仅提供有关完整相互作用景观的一个方面的部分信息。例如,染色质免疫沉淀 (ChIP) 揭示了蛋白质的结合位置,但一次只能揭示一种蛋白质的位置。相比之下,核酸酶如 MNase 和 DNase 可用于同时揭示许多不同蛋白质的结合位置,但不能轻易确定这些蛋白质的身份。目前,很少有统计框架联合这些不同的数据源来揭示体内蛋白质-DNA 相互作用景观的准确、整体视图。

结果

在这里,我们开发了一种新的统计框架,该框架在竞争结合的热力学模型内整合了不同来源的实验信息,以共同学习体内蛋白质-DNA 相互作用景观的整体视图。我们表明,我们的框架以更高的准确性学习相互作用景观,根据竞争 DNA 结合的热力学原理解释多组数据。由此产生的基因组占有率模型提供了一个精确的机械视角,可从中探索蛋白质-DNA 相互作用在转录调控中的作用。

可用性和实现

compete 的 C 源代码和基于 MCMC 推断的 Python 源代码可在 http://www.cs.duke.edu/∼amink 获得。

联系人

amink@cs.duke.edu

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
Learning protein-DNA interaction landscapes by integrating experimental data through computational models.
Bioinformatics. 2014 Oct 15;30(20):2868-74. doi: 10.1093/bioinformatics/btu408. Epub 2014 Jun 27.
2
Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning.
Bioinformatics. 2017 Jul 15;33(14):2097-2105. doi: 10.1093/bioinformatics/btx115.
3
RoboCOP: jointly computing chromatin occupancy profiles for numerous factors from chromatin accessibility data.
Nucleic Acids Res. 2021 Aug 20;49(14):7925-7938. doi: 10.1093/nar/gkab553.
4
Quantitative models of the mechanisms that control genome-wide patterns of animal transcription factor binding.
Methods Cell Biol. 2012;110:263-83. doi: 10.1016/B978-0-12-388403-9.00011-4.
5
Mapping nucleosome positions using DNase-seq.
Genome Res. 2016 Mar;26(3):351-64. doi: 10.1101/gr.195602.115. Epub 2016 Jan 15.
6
BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.
Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.
7
Distinguishing direct versus indirect transcription factor-DNA interactions.
Genome Res. 2009 Nov;19(11):2090-100. doi: 10.1101/gr.094144.109. Epub 2009 Aug 3.
8
Efficient inference for sparse latent variable models of transcriptional regulation.
Bioinformatics. 2017 Dec 1;33(23):3776-3783. doi: 10.1093/bioinformatics/btx508.
9
Nonconsensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes.
PLoS Comput Biol. 2015 Aug 18;11(8):e1004429. doi: 10.1371/journal.pcbi.1004429. eCollection 2015 Aug.
10
SignalSpider: probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles.
Bioinformatics. 2015 Jan 1;31(1):17-24. doi: 10.1093/bioinformatics/btu604. Epub 2014 Sep 5.

引用本文的文献

1
Inferring differential protein binding from time-series chromatin accessibility data.
Bioinform Adv. 2025 Apr 10;5(1):vbaf080. doi: 10.1093/bioadv/vbaf080. eCollection 2025.
3
RoboCOP: Multivariate State Space Model Integrating Epigenomic Accessibility Data to Elucidate Genome-Wide Chromatin Occupancy.
Res Comput Mol Biol. 2020 May;12074:136-151. doi: 10.1007/978-3-030-45257-5_9. Epub 2020 Apr 21.
4
Mapping nucleosome positions using DNase-seq.
Genome Res. 2016 Mar;26(3):351-64. doi: 10.1101/gr.195602.115. Epub 2016 Jan 15.
5
Implications of Big Data for cell biology.
Mol Biol Cell. 2015 Jul 15;26(14):2575-8. doi: 10.1091/mbc.E13-12-0756.
6
Protein-DNA binding in high-resolution.
Crit Rev Biochem Mol Biol. 2015;50(4):269-83. doi: 10.3109/10409238.2015.1051505. Epub 2015 Jun 3.

本文引用的文献

2
Evaluation of methods for modeling transcription factor sequence specificity.
Nat Biotechnol. 2013 Feb;31(2):126-34. doi: 10.1038/nbt.2486. Epub 2013 Jan 27.
4
Genome-wide structure and organization of eukaryotic pre-initiation complexes.
Nature. 2012 Jan 18;483(7389):295-301. doi: 10.1038/nature10799.
6
Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution.
Cell. 2011 Dec 9;147(6):1408-19. doi: 10.1016/j.cell.2011.11.013.
7
Epigenome characterization at single base-pair resolution.
Proc Natl Acad Sci U S A. 2011 Nov 8;108(45):18318-23. doi: 10.1073/pnas.1110731108. Epub 2011 Oct 24.
8
Animal transcription networks as highly connected, quantitative continua.
Dev Cell. 2011 Oct 18;21(4):611-26. doi: 10.1016/j.devcel.2011.09.008.
9
Calculating transcription factor binding maps for chromatin.
Brief Bioinform. 2012 Mar;13(2):187-201. doi: 10.1093/bib/bbr037. Epub 2011 Jul 6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验