Department of Molecular Virology, Immunology & Medical Genetics, Ohio State University, Columbus, OH 43210, USA.
Bioinformatics. 2009 Sep 15;25(18):2334-40. doi: 10.1093/bioinformatics/btp384. Epub 2009 Jun 26.
Antibody-based Chromatin Immunoprecipitation assay followed by high-throughput sequencing technology (ChIP-seq) is a relatively new method to study the binding patterns of specific protein molecules over the entire genome. ChIP-seq technology allows scientist to get more comprehensive results in shorter time. Here, we present a non-linear normalization algorithm and a mixture modeling method for comparing ChIP-seq data from multiple samples and characterizing genes based on their RNA polymerase II (Pol II) binding patterns.
We apply a two-step non-linear normalization method based on locally weighted regression (LOESS) approach to compare ChIP-seq data across multiple samples and model the difference using an Exponential-Normal(K) mixture model. Fitted model is used to identify genes associated with differential binding sites based on local false discovery rate (fdr). These genes are then standardized and hierarchically clustered to characterize their Pol II binding patterns. As a case study, we apply the analysis procedure comparing normal breast cancer (MCF7) to tamoxifen-resistant (OHT) cell line. We find enriched regions that are associated with cancer (P < 0.0001). Our findings also imply that there may be a dysregulation of cell cycle and gene expression control pathways in the tamoxifen-resistant cells. These results show that the non-linear normalization method can be used to analyze ChIP-seq data across multiple samples.
Data are available at http://www.bmi.osu.edu/~khuang/Data/ChIP/RNAPII/.
基于抗体的染色质免疫沉淀检测技术(ChIP)结合高通量测序技术(ChIP-seq)是一种研究特定蛋白质分子在整个基因组上的结合模式的相对较新的方法。ChIP-seq 技术可以让科学家在更短的时间内获得更全面的结果。在这里,我们提出了一种非线性归一化算法和一种混合建模方法,用于比较多个样本的 ChIP-seq 数据,并根据其 RNA 聚合酶 II(Pol II)结合模式对基因进行特征化。
我们应用了一种两步非线性归一化方法,该方法基于局部加权回归(LOESS)方法,用于比较多个样本的 ChIP-seq 数据,并使用指数-正态(K)混合模型来模拟差异。拟合模型用于根据局部假发现率(fdr)识别与差异结合位点相关的基因。然后对这些基因进行标准化和层次聚类,以描述它们的 Pol II 结合模式。作为一个案例研究,我们应用该分析程序比较了正常乳腺癌(MCF7)和他莫昔芬耐药(OHT)细胞系。我们发现了与癌症相关的富集区域(P < 0.0001)。我们的发现还表明,在他莫昔芬耐药细胞中,细胞周期和基因表达调控途径可能存在失调。这些结果表明,非线性归一化方法可用于分析多个样本的 ChIP-seq 数据。