Zhang Qi, Zeng Xin, Younkin Sam, Kawli Trupti, Snyder Michael P, Keleş Sündüz
Department of Statistics, University of Nebraska Lincoln, Lincoln, Nebraska, USA.
Department of Statistics, University of Wisconsin Madison, Madison, Wisconsin, USA.
BMC Bioinformatics. 2016 Feb 24;17:96. doi: 10.1186/s12859-016-0957-1.
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection.
We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection.
Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.
染色质免疫沉淀测序(ChIP-seq)实验彻底改变了转录因子和组蛋白修饰的全基因组分析。尽管成熟的测序技术允许使用短(36 - 50碱基对)、长(75 - 100碱基对)、单端或双端 reads 进行这些实验,但这些 reads 参数对下游数据分析的影响尚未得到很好的理解。在本文中,我们评估了不同 reads 参数对基因组序列比对、不同类型基因组特征的覆盖、峰识别和等位基因特异性结合检测的影响。
我们从人类 GM12878 和 MCF7 细胞系中生成了许多转录因子的101碱基对双端 ChIP-seq 数据。使用这些数据的计算机模拟变异以及完全模拟的数据进行系统评估,揭示了测序参数与分析工具之间复杂的相互作用,并表明双端设计在几个方面具有明显优势,如比对准确性、峰分辨率,最显著的是等位基因特异性结合检测。
我们的工作阐明了设计对下游分析的影响,并为研究人员在 ChIP-seq 实验中决定测序参数提供了见解。我们首次对 ChIP-seq 设计对等位基因特异性结合检测的影响进行了系统评估,并强调了双端设计在此类研究中的作用。