Computational Biology Unit, Institut de Recherche Clinique de Montreal, Montreal, Canada.
PLoS One. 2011 Feb 16;6(2):e16432. doi: 10.1371/journal.pone.0016432.
ChIP-Seq has become the standard method for genome-wide profiling DNA association of transcription factors. To simplify analyzing and interpreting ChIP-Seq data, which typically involves using multiple applications, we describe an integrated, open source, R-based analysis pipeline. The pipeline addresses data input, peak detection, sequence and motif analysis, visualization, and data export, and can readily be extended via other R and Bioconductor packages. Using a standard multicore computer, it can be used with datasets consisting of tens of thousands of enriched regions. We demonstrate its effectiveness on published human ChIP-Seq datasets for FOXA1, ER, CTCF and STAT1, where it detected co-occurring motifs that were consistent with the literature but not detected by other methods. Our pipeline provides the first complete set of Bioconductor tools for sequence and motif analysis of ChIP-Seq and ChIP-chip data.
ChIP-Seq 已成为用于全基因组分析转录因子与 DNA 关联的标准方法。为了简化分析和解释 ChIP-Seq 数据(通常需要使用多个应用程序),我们描述了一个集成的、开源的、基于 R 的分析管道。该管道解决了数据输入、峰检测、序列和基序分析、可视化和数据导出等问题,并且可以通过其他 R 和 Bioconductor 包轻松扩展。使用标准的多核计算机,它可以用于包含数万个富集区域的数据集。我们在已发表的人类 ChIP-Seq 数据集(FOXA1、ER、CTCF 和 STAT1)上验证了其有效性,该数据集检测到了与文献一致但其他方法未检测到的共同基序。我们的管道提供了用于 ChIP-Seq 和 ChIP-chip 数据的序列和基序分析的第一个完整的 Bioconductor 工具集。