Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune 411008, India.
Nucleic Acids Res. 2013 Jan 7;41(1):21-32. doi: 10.1093/nar/gks950. Epub 2012 Oct 22.
High-throughput chromatin immunoprecipitation has become the method of choice for identifying genomic regions bound by a protein. Such regions are then investigated for overrepresented sequence motifs, the assumption being that they must correspond to the binding specificity of the profiled protein. However this approach often fails: many bound regions do not contain the 'expected' motif. This is because binding DNA directly at its recognition site is not the only way the protein can cause the region to immunoprecipitate. Its binding specificity can change through association with different co-factors, it can bind DNA indirectly, through intermediaries, or even enforce its function through long-range chromosomal interactions. Conventional motif discovery methods, though largely capable of identifying overrepresented motifs from bound regions, lack the ability to characterize such diverse modes of protein-DNA binding and binding specificities. We present a novel Bayesian method that identifies distinct protein-DNA binding mechanisms without relying on any motif database. The method successfully identifies co-factors of proteins that do not bind DNA directly, such as mediator and p300. It also predicts literature-supported enhancer-promoter interactions. Even for well-studied direct-binding proteins, this method provides compelling evidence for previously uncharacterized dependencies within positions of binding sites, long-range chromosomal interactions and dimerization.
高通量染色质免疫沉淀已成为鉴定蛋白质结合基因组区域的首选方法。然后,这些区域会被进一步研究,寻找过度表达的序列模体,其假设是它们必须与所研究蛋白的结合特异性相对应。然而,这种方法经常失败:许多结合区域并不包含“预期”的模体。这是因为蛋白直接在其识别位点结合 DNA 并不是其引起该区域免疫沉淀的唯一方式。其结合特异性可以通过与不同共因子的关联而改变,它可以通过中介物间接结合 DNA,甚至通过长程染色体相互作用来执行其功能。传统的模体发现方法虽然在很大程度上能够从结合区域中识别过度表达的模体,但缺乏表征这种不同蛋白-DNA 结合方式和结合特异性的能力。我们提出了一种新颖的贝叶斯方法,该方法无需依赖任何模体数据库即可识别不同的蛋白-DNA 结合机制。该方法成功地鉴定了不直接与 DNA 结合的蛋白的共因子,如中介物和 p300。它还预测了文献支持的增强子-启动子相互作用。即使对于研究充分的直接结合蛋白,该方法也为结合位点、长程染色体相互作用和二聚化的先前未表征的依赖性提供了令人信服的证据。