Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America.
PLoS Comput Biol. 2011 Jul;7(7):e1002111. doi: 10.1371/journal.pcbi.1002111. Epub 2011 Jul 14.
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
染色质免疫沉淀结合高通量测序(ChIP-seq)正在迅速取代染色质免疫沉淀联合全基因组平铺阵列分析(ChIP-chip),成为绘制转录因子结合位点和染色质修饰图谱的首选方法。分析 ChIP-seq 数据的最新方法依赖于仅使用唯一映射到相关参考基因组的读取(uni-reads)。这可能导致高达 30%的可对齐读取被遗漏。我们描述了一种利用可映射到参考基因组多个位置的读取(multi-reads)的通用方法。我们的方法基于使用加权对齐方案为 multi-reads 分配分数计数。使用人类 STAT1 和小鼠 GATA1 ChIP-seq 数据集,我们说明整合 multi-reads 可显著增加测序深度,可检测到无法用 uni-reads 检测到的新峰,并且可提高可映射区域峰的检测。我们通过计算实验研究了仅通过利用 multi-reads 检测到的峰的各种全基因组特征。总体而言,multi-read 分析得到的峰与用 uni-reads 鉴定的峰具有相似的特征,除了大多数峰位于片段重复区。我们通过独立的定量实时 ChIP 分析进一步验证了一些 GATA1 multi-read 仅有的峰,并鉴定了 GATA1 的新靶基因。这些计算和实验结果表明,multi-reads 对于用 ChIP-seq 实验研究基因组高度重复区的转录因子结合至关重要。