Ramachandran Parameswaran, Palidwor Gareth A, Perkins Theodore J
Regenerative Medicine Program, Ottawa Hospital Research Institute, K1H 8L6 Ottawa, Canada ; Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, K1H 8M5 Ottawa, Canada.
Regenerative Medicine Program, Ottawa Hospital Research Institute, K1H 8L6 Ottawa, Canada.
Epigenetics Chromatin. 2015 Sep 17;8:33. doi: 10.1186/s13072-015-0028-2. eCollection 2015.
Unraveling transcriptional regulatory networks is a central problem in molecular biology and, in this quest, chromatin immunoprecipitation and sequencing (ChIP-seq) technology has given us the unprecedented ability to identify sites of protein-DNA binding and histone modification genome wide. However, multiple systemic and procedural biases hinder harnessing the full potential of this technology. Previous studies have addressed this problem, but a thorough characterization of different, interacting biases on ChIP-seq signals is still lacking.
Here, we present a novel framework where the genome-wide ChIP-seq signal is viewed as being quantifiably influenced by different, measurable sources of bias, which can then be computationally subtracted away. We use a compendium of 123 human ENCODE ChIP-seq datasets to build regression models that tell us how much of a ChIP-seq signal can be attributed to mappability, GC-content, chromatin accessibility, and factors represented in input DNA and IgG controls. When we use the model to separate out these non-binding influences from the ChIP-seq signal, we obtain a purified signal that associates better to TF-DNA-binding motifs than do other measures of peak significance. We also carry out a multiscale analysis that reveals how ChIP-seq signal biases differ across different scales. Finally, we investigate previously reported associations between gene expression and ChIP-seq signals at transcription start sites. We show that our model can be used to discriminate ChIP-seq signals that are truly related to gene expression from those that are merely correlated by virtue of bias-in particular, chromatin accessibility bias, which shows up in ChIP-seq signals and also relates to gene expression.
Our study provides new insights into the behavior of ChIP-seq signal biases and proposes a novel mitigation framework that improves results compared to existing techniques. With ChIP-seq now being the central technology for studying transcriptional regulation, it is most crucial to accurately characterize, quantify, and adjust for the genome-wide effects of biases affecting ChIP-seq. Our study also emphasizes that properly accounting for confounders in ChIP-seq data is of paramount importance for obtaining biologically accurate insights into the workings of the complex regulatory mechanisms in living organisms. R and MATLAB packages implementing the framework can be obtained from http://www.perkinslab.ca/Software.html.
解析转录调控网络是分子生物学中的核心问题,在这一探索过程中,染色质免疫沉淀测序(ChIP-seq)技术赋予了我们在全基因组范围内识别蛋白质-DNA结合位点和组蛋白修饰位点的前所未有的能力。然而,多种系统和程序偏差阻碍了该技术全部潜力的发挥。以往的研究已经探讨了这个问题,但仍缺乏对ChIP-seq信号中不同的、相互作用的偏差进行全面的表征。
在此,我们提出了一个新的框架,其中全基因组ChIP-seq信号被视为受到不同的、可测量的偏差来源的定量影响,然后可以通过计算将其减去。我们使用123个人类ENCODE ChIP-seq数据集的汇总来构建回归模型,这些模型告诉我们ChIP-seq信号的多少可以归因于可映射性、GC含量、染色质可及性以及输入DNA和IgG对照中所代表的因素。当我们使用该模型从ChIP-seq信号中分离出这些非结合影响时,我们获得了一个纯化的信号,该信号与TF-DNA结合基序的关联比其他峰显著性度量更好。我们还进行了多尺度分析,揭示了ChIP-seq信号偏差在不同尺度上的差异。最后,我们研究了先前报道的基因表达与转录起始位点处的ChIP-seq信号之间的关联。我们表明,我们的模型可用于区分与基因表达真正相关的ChIP-seq信号和那些仅因偏差(特别是染色质可及性偏差,它在ChIP-seq信号中出现并且也与基因表达相关)而相关的信号。
我们的研究为ChIP-seq信号偏差的行为提供了新的见解,并提出了一个新的缓解框架,与现有技术相比可改善结果。由于ChIP-seq现在是研究转录调控的核心技术,准确表征、量化和调整影响ChIP-seq的全基因组偏差效应至关重要。我们的研究还强调,在ChIP-seq数据中正确考虑混杂因素对于获得关于生物体中复杂调控机制运作的生物学准确见解至关重要。实现该框架的R和MATLAB软件包可从http://www.perkinslab.ca/Software.html获得。