一种用于多个ChIP-seq数据集定量比较的新型统计方法。

A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.

作者信息

Chen Li, Wang Chi, Qin Zhaohui S, Wu Hao

机构信息

Department of Mathematics and Computer Science, Atlanta, GA 30322, USA, Department of Biostatistics and Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA and Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA Department of Mathematics and Computer Science, Atlanta, GA 30322, USA, Department of Biostatistics and Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA and Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, USA.

出版信息

Bioinformatics. 2015 Jun 15;31(12):1889-96. doi: 10.1093/bioinformatics/btv094. Epub 2015 Feb 13.

DOI:10.1093/bioinformatics/btv094

PMID:25682068

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4542775/

Abstract

MOTIVATION

ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed.

RESULTS

In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones.

AVAILABILITY AND IMPLEMENTATION

An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html.

摘要

动机

染色质免疫沉淀测序（ChIP-seq）是一项在全基因组范围内测量蛋白质结合或组蛋白修饰强度的强大技术。尽管有许多方法可用于单个ChIP-seq数据分析（例如“峰检测”），但考虑到来自对照实验的数据、信噪比、生物学变异和多因素实验设计，用于多个ChIP-seq数据集定量比较的严格统计方法仍未充分发展。

结果

在这项工作中，我们开发了一种统计方法来对多个ChIP-seq数据集进行定量比较，并检测显示差异蛋白质结合或组蛋白修饰的基因组区域。我们首先从所有数据集中检测峰，然后将它们合并以形成一组候选区域。假设候选区域处IP实验的读取计数遵循泊松分布。潜在的泊松率被建模为假象和生物学信号的特定实验函数。然后，我们获得估计的生物学信号，并通过线性模型框架中的假设检验程序对它们进行比较。模拟和实际数据分析表明，与现有方法相比，所提出的方法提供了更准确和稳健的结果。