Department of Biochemistry & Molecular Biology and Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA.
Bioinformatics. 2019 Mar 15;35(6):903-913. doi: 10.1093/bioinformatics/bty703.
Regulatory proteins associate with the genome either by directly binding cognate DNA motifs or via protein-protein interactions with other regulators. Each recruitment mechanism may be associated with distinct motifs and may also result in distinct characteristic patterns in high-resolution protein-DNA binding assays. For example, the ChIP-exo protocol precisely characterizes protein-DNA crosslinking patterns by combining chromatin immunoprecipitation (ChIP) with 5' → 3' exonuclease digestion. Since different regulatory complexes will result in different protein-DNA crosslinking signatures, analysis of ChIP-exo tag enrichment patterns should enable detection of multiple protein-DNA binding modes for a given regulatory protein. However, current ChIP-exo analysis methods either treat all binding events as being of a uniform type or rely on motifs to cluster binding events into subtypes.
To systematically detect multiple protein-DNA interaction modes in a single ChIP-exo experiment, we introduce the ChIP-exo mixture model (ChExMix). ChExMix probabilistically models the genomic locations and subtype memberships of binding events using both ChIP-exo tag distribution patterns and DNA motifs. We demonstrate that ChExMix achieves accurate detection and classification of binding event subtypes using in silico mixed ChIP-exo data. We further demonstrate the unique analysis abilities of ChExMix using a collection of ChIP-exo experiments that profile the binding of key transcription factors in MCF-7 cells. In these data, ChExMix identifies possible recruitment mechanisms of FoxA1 and ERα, thus demonstrating that ChExMix can effectively stratify ChIP-exo binding events into biologically meaningful subtypes.
ChExMix is available from https://github.com/seqcode/chexmix.
Supplementary data are available at Bioinformatics online.
调节蛋白通过直接结合同源 DNA 基序或与其他调节因子的蛋白-蛋白相互作用与基因组结合。每种募集机制都可能与独特的基序相关联,并且在高分辨率的蛋白质-DNA 结合测定中也可能导致独特的特征模式。例如,ChIP-exo 方案通过将染色质免疫沉淀 (ChIP) 与 5' → 3' 核酸外切酶消化相结合,精确地描述了蛋白质-DNA 交联模式。由于不同的调节复合物将导致不同的蛋白质-DNA 交联特征,因此分析 ChIP-exo 标签富集模式应该能够检测到给定调节蛋白的多种蛋白质-DNA 结合模式。然而,当前的 ChIP-exo 分析方法要么将所有结合事件视为具有统一类型,要么依赖基序将结合事件聚类为亚类。
为了在单个 ChIP-exo 实验中系统地检测多种蛋白质-DNA 相互作用模式,我们引入了 ChIP-exo 混合物模型 (ChExMix)。ChExMix 使用 ChIP-exo 标签分布模式和 DNA 基序来概率地建模结合事件的基因组位置和亚类成员身份。我们证明 ChExMix 使用混合 ChIP-exo 数据的模拟准确地检测和分类结合事件的亚类。我们进一步使用一组分析 MCF-7 细胞中关键转录因子结合的 ChIP-exo 实验来证明 ChExMix 的独特分析能力。在这些数据中,ChExMix 确定了 FoxA1 和 ERα 的可能募集机制,从而证明 ChExMix 可以有效地将 ChIP-exo 结合事件分层为具有生物学意义的亚类。
ChExMix 可从 https://github.com/seqcode/chexmix 获得。
补充数据可在 Bioinformatics 在线获得。