Wang Liguo, Chen Junsheng, Wang Chen, Uusküla-Reimand Liis, Chen Kaifu, Medina-Rivera Alejandra, Young Edwin J, Zimmermann Michael T, Yan Huihuang, Sun Zhifu, Zhang Yuji, Wu Stephen T, Huang Haojie, Wilson Michael D, Kocher Jean-Pierre A, Li Wei
Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN 55905, USA Division of Biostatistics, Dan L. Duncan Cancer Center and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
School of Life Science and Technology, Tongji University, Shanghai 200092, China.
Nucleic Acids Res. 2014 Nov 10;42(20):e156. doi: 10.1093/nar/gku846. Epub 2014 Sep 23.
Understanding the role of a given transcription factor (TF) in regulating gene expression requires precise mapping of its binding sites in the genome. Chromatin immunoprecipitation-exo, an emerging technique using λ exonuclease to digest TF unbound DNA after ChIP, is designed to reveal transcription factor binding site (TFBS) boundaries with near-single nucleotide resolution. Although ChIP-exo promises deeper insights into transcription regulation, no dedicated bioinformatics tool exists to leverage its advantages. Most ChIP-seq and ChIP-chip analytic methods are not tailored for ChIP-exo, and thus cannot take full advantage of high-resolution ChIP-exo data. Here we describe a novel analysis framework, termed MACE (model-based analysis of ChIP-exo) dedicated to ChIP-exo data analysis. The MACE workflow consists of four steps: (i) sequencing data normalization and bias correction; (ii) signal consolidation and noise reduction; (iii) single-nucleotide resolution border peak detection using the Chebyshev Inequality and (iv) border matching using the Gale-Shapley stable matching algorithm. When applied to published human CTCF, yeast Reb1 and our own mouse ONECUT1/HNF6 ChIP-exo data, MACE is able to define TFBSs with high sensitivity, specificity and spatial resolution, as evidenced by multiple criteria including motif enrichment, sequence conservation, direct sequence pileup, nucleosome positioning and open chromatin states. In addition, we show that the fundamental advance of MACE is the identification of two boundaries of a TFBS with high resolution, whereas other methods only report a single location of the same event. The two boundaries help elucidate the in vivo binding structure of a given TF, e.g. whether the TF may bind as dimers or in a complex with other co-factors.
了解特定转录因子(TF)在调控基因表达中的作用需要精确绘制其在基因组中的结合位点。染色质免疫沉淀外切酶技术(Chromatin immunoprecipitation-exo)是一种新兴技术,它在染色质免疫沉淀(ChIP)后使用λ外切核酸酶消化未结合TF的DNA,旨在以近乎单核苷酸分辨率揭示转录因子结合位点(TFBS)的边界。尽管ChIP-exo有望更深入地了解转录调控,但目前还没有专门利用其优势的生物信息学工具。大多数ChIP-seq和ChIP-chip分析方法并非为ChIP-exo量身定制,因此无法充分利用高分辨率的ChIP-exo数据。在此,我们描述了一种专门用于ChIP-exo数据分析的新型分析框架,称为MACE(基于模型的ChIP-exo分析)。MACE工作流程包括四个步骤:(i)测序数据归一化和偏差校正;(ii)信号整合和噪声降低;(iii)使用切比雪夫不等式进行单核苷酸分辨率边界峰检测;(iv)使用盖尔-沙普利稳定匹配算法进行边界匹配。当应用于已发表的人类CTCF、酵母Reb1以及我们自己的小鼠ONECUT1/HNF6 ChIP-exo数据时,MACE能够以高灵敏度、特异性和空间分辨率定义TFBS,这在包括基序富集、序列保守性、直接序列堆积、核小体定位和开放染色质状态等多个标准中得到了证明。此外,我们表明MACE的根本进步在于以高分辨率识别TFBS的两个边界,而其他方法只报告同一事件的单个位置。这两个边界有助于阐明给定TF在体内的结合结构,例如TF是否可能以二聚体形式结合或与其他辅助因子形成复合物。