Department of Animal and Avian Sciences, University of Maryland, College Park, MD, USA.
PLoS One. 2012;7(9):e45486. doi: 10.1371/journal.pone.0045486. Epub 2012 Sep 28.
Chromatin immunoprecipitation followed by next-generation sequencing is a genome-wide analysis technique that can be used to detect various epigenetic phenomena such as, transcription factor binding sites and histone modifications. Histone modification profiles can be either punctate or diffuse which makes it difficult to distinguish regions of enrichment from background noise. With the discovery of histone marks having a wide variety of enrichment patterns, there is an urgent need for analysis methods that are robust to various data characteristics and capable of detecting a broad range of enrichment patterns.
To address these challenges we propose WaveSeq, a novel data-driven method of detecting regions of significant enrichment in ChIP-Seq data. Our approach utilizes the wavelet transform, is free of distributional assumptions and is robust to diverse data characteristics such as low signal-to-noise ratios and broad enrichment patterns. Using publicly available datasets we showed that WaveSeq compares favorably with other published methods, exhibiting high sensitivity and precision for both punctate and diffuse enrichment regions even in the absence of a control data set. The application of our algorithm to a complex histone modification data set helped make novel functional discoveries which further underlined its utility in such an experimental setup.
WaveSeq is a highly sensitive method capable of accurate identification of enriched regions in a broad range of data sets. WaveSeq can detect both narrow and broad peaks with a high degree of accuracy even in low signal-to-noise ratio data sets. WaveSeq is also suited for application in complex experimental scenarios, helping make biologically relevant functional discoveries.
染色质免疫沉淀结合下一代测序是一种全基因组分析技术,可用于检测各种表观遗传现象,如转录因子结合位点和组蛋白修饰。组蛋白修饰谱可以是点状的或弥散的,这使得难以区分富集区域和背景噪声。随着发现具有广泛富集模式的组蛋白标记,迫切需要具有稳健性的分析方法,能够适应各种数据特征并能够检测广泛的富集模式。
为了解决这些挑战,我们提出了 WaveSeq,这是一种用于检测 ChIP-Seq 数据中显著富集区域的新数据驱动方法。我们的方法利用小波变换,不受分布假设的限制,并且对各种数据特征(如低信噪比和广泛的富集模式)具有稳健性。使用公开可用的数据集,我们表明 WaveSeq 与其他已发表的方法相比具有优势,即使在没有对照数据集的情况下,对点状和弥散富集区域均具有高灵敏度和精度。我们的算法在复杂的组蛋白修饰数据集上的应用有助于做出新的功能发现,进一步强调了其在这种实验设置中的实用性。
WaveSeq 是一种高度敏感的方法,能够准确识别广泛数据集的富集区域。WaveSeq 即使在低信噪比数据集也能以高度准确性检测到宽窄峰。WaveSeq 还适用于复杂的实验场景,有助于做出具有生物学意义的功能发现。