Gomes Antonio L C, Wang Harris H
Department of Systems Biology, Columbia University, New York, New York, United States of America.
Department of Pathology and Cell Biology, Columbia University, New York, New York, United States of America.
PLoS Comput Biol. 2016 Apr 22;12(4):e1004891. doi: 10.1371/journal.pcbi.1004891. eCollection 2016 Apr.
ChIP-seq enables genome-scale identification of regulatory regions that govern gene expression. However, the biological insights generated from ChIP-seq analysis have been limited to predictions of binding sites and cooperative interactions. Furthermore, ChIP-seq data often poorly correlate with in vitro measurements or predicted motifs, highlighting that binding affinity alone is insufficient to explain transcription factor (TF)-binding in vivo. One possibility is that binding sites are not equally accessible across the genome. A more comprehensive biophysical representation of TF-binding is required to improve our ability to understand, predict, and alter gene expression. Here, we show that genome accessibility is a key parameter that impacts TF-binding in bacteria. We developed a thermodynamic model that parameterizes ChIP-seq coverage in terms of genome accessibility and binding affinity. The role of genome accessibility is validated using a large-scale ChIP-seq dataset of the M. tuberculosis regulatory network. We find that accounting for genome accessibility led to a model that explains 63% of the ChIP-seq profile variance, while a model based in motif score alone explains only 35% of the variance. Moreover, our framework enables de novo ChIP-seq peak prediction and is useful for inferring TF-binding peaks in new experimental conditions by reducing the need for additional experiments. We observe that the genome is more accessible in intergenic regions, and that increased accessibility is positively correlated with gene expression and anti-correlated with distance to the origin of replication. Our biophysically motivated model provides a more comprehensive description of TF-binding in vivo from first principles towards a better representation of gene regulation in silico, with promising applications in systems biology.
染色质免疫沉淀测序(ChIP-seq)能够在全基因组范围内鉴定调控基因表达的区域。然而,ChIP-seq分析所产生的生物学见解仅限于对结合位点和协同相互作用的预测。此外,ChIP-seq数据往往与体外测量结果或预测的基序相关性较差,这突出表明仅结合亲和力不足以解释转录因子(TF)在体内的结合情况。一种可能性是结合位点在整个基因组中的可及性并不相同。需要一种更全面的TF结合生物物理表征,以提高我们理解、预测和改变基因表达的能力。在这里,我们表明基因组可及性是影响细菌中TF结合的关键参数。我们开发了一个热力学模型,该模型根据基因组可及性和结合亲和力对ChIP-seq覆盖度进行参数化。利用结核分枝杆菌调控网络的大规模ChIP-seq数据集验证了基因组可及性的作用。我们发现,考虑基因组可及性后得到的模型能够解释63%的ChIP-seq图谱方差,而仅基于基序得分的模型只能解释35%的方差。此外,我们的框架能够进行从头ChIP-seq峰预测,并且通过减少额外实验的需求,有助于在新的实验条件下推断TF结合峰。我们观察到基因间区域的基因组可及性更高,并且可及性增加与基因表达呈正相关,与到复制起点的距离呈负相关。我们基于生物物理学的模型从第一原理出发,对TF在体内的结合提供了更全面的描述,以便在计算机上更好地表示基因调控,在系统生物学中具有广阔的应用前景。