School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, K1N6N5, Canada.
Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, K1H8L6, Canada.
BMC Bioinformatics. 2021 Feb 15;22(1):69. doi: 10.1186/s12859-020-03927-2.
Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq), initially introduced more than a decade ago, is widely used by the scientific community to detect protein/DNA binding and histone modifications across the genome. Every experiment is prone to noise and bias, and ChIP-seq experiments are no exception. To alleviate bias, the incorporation of control datasets in ChIP-seq analysis is an essential step. The controls are used to account for the background signal, while the remainder of the ChIP-seq signal captures true binding or histone modification. However, a recurrent issue is different types of bias in different ChIP-seq experiments. Depending on which controls are used, different aspects of ChIP-seq bias are better or worse accounted for, and peak calling can produce different results for the same ChIP-seq experiment. Consequently, generating "smart" controls, which model the non-signal effect for a specific ChIP-seq experiment, could enhance contrast and increase the reliability and reproducibility of the results.
We propose a peak calling algorithm, Weighted Analysis of ChIP-seq (WACS), which is an extension of the well-known peak caller MACS2. There are two main steps in WACS: First, weights are estimated for each control using non-negative least squares regression. The goal is to customize controls to model the noise distribution for each ChIP-seq experiment. This is then followed by peak calling. We demonstrate that WACS significantly outperforms MACS2 and AIControl, another recent algorithm for generating smart controls, in the detection of enriched regions along the genome, in terms of motif enrichment and reproducibility analyses.
This ultimately improves our understanding of ChIP-seq controls and their biases, and shows that WACS results in a better approximation of the noise distribution in controls.
染色质免疫沉淀结合高通量测序(ChIP-seq)最初于十多年前推出,现已被科学界广泛用于检测整个基因组中的蛋白质/DNA 结合和组蛋白修饰。每个实验都容易受到噪声和偏差的影响,ChIP-seq 实验也不例外。为了减轻偏差,在 ChIP-seq 分析中纳入对照数据集是必不可少的步骤。对照用于解释背景信号,而 ChIP-seq 信号的其余部分则捕获真实的结合或组蛋白修饰。然而,一个反复出现的问题是不同的 ChIP-seq 实验存在不同类型的偏差。根据所使用的对照,ChIP-seq 偏差的不同方面可以得到更好或更差的解释,峰调用可能会对同一个 ChIP-seq 实验产生不同的结果。因此,生成“智能”对照,可以为特定的 ChIP-seq 实验模拟非信号效应,从而增强对比度,并提高结果的可靠性和可重复性。
我们提出了一种峰调用算法,加权 ChIP-seq 分析(WACS),它是著名的峰调用器 MACS2 的扩展。WACS 有两个主要步骤:首先,使用非负最小二乘回归为每个对照估计权重。目标是定制对照以模拟每个 ChIP-seq 实验的噪声分布。然后进行峰调用。我们证明,在检测基因组上富集区域方面,WACS 在基于基序富集和重现性分析的算法方面,显著优于 MACS2 和另一个最近用于生成智能对照的算法 AIControl。
这最终改善了我们对 ChIP-seq 对照及其偏差的理解,并表明 WACS 可以更好地逼近对照中的噪声分布。