Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
Plant Cell. 2022 Nov 29;34(12):4795-4815. doi: 10.1093/plcell/koac282.
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is widely used to identify factor binding to genomic DNA and chromatin modifications. ChIP-seq data analysis is affected by genomic regions that generate ultra-high artifactual signals. To remove these signals from ChIP-seq data, the Encyclopedia of DNA Elements (ENCODE) project developed comprehensive sets of regions defined by low mappability and ultra-high signals called blacklists for human, mouse (Mus musculus), nematode (Caenorhabditis elegans), and fruit fly (Drosophila melanogaster). However, blacklists are not currently available for many model and nonmodel species. Here, we describe an alternative approach for removing false-positive peaks called greenscreen. Greenscreen is easy to implement, requires few input samples, and uses analysis tools frequently employed for ChIP-seq. Greenscreen removes artifactual signals as effectively as blacklists in Arabidopsis thaliana and human ChIP-seq dataset while covering less of the genome and dramatically improves ChIP-seq peak calling and downstream analyses. Greenscreen filtering reveals true factor binding overlap and occupancy changes in different genetic backgrounds or tissues. Because it is effective with as few as two inputs, greenscreen is readily adaptable for use in any species or genome build. Although developed for ChIP-seq, greenscreen also identifies artifactual signals from other genomic datasets including Cleavage Under Targets and Release Using Nuclease. We present an improved ChIP-seq pipeline incorporating greenscreen that detects more true peaks than other methods.
染色质免疫沉淀测序(ChIP-seq)广泛用于鉴定与基因组 DNA 和染色质修饰结合的因子。ChIP-seq 数据分析受到产生超高人工信号的基因组区域的影响。为了从 ChIP-seq 数据中去除这些信号,DNA 元件百科全书(ENCODE)项目为人类、小鼠(Mus musculus)、线虫(Caenorhabditis elegans)和果蝇(Drosophila melanogaster)开发了由低可映射性和超高信号定义的综合区域集,称为黑名单。然而,目前许多模式和非模式物种都没有黑名单。在这里,我们描述了一种称为 greenscreen 的去除假阳性峰的替代方法。Greenscreen 易于实施,只需要少量输入样本,并且使用 ChIP-seq 常用的分析工具。Greenscreen 在拟南芥和人类 ChIP-seq 数据集上与黑名单一样有效地去除人工信号,同时覆盖的基因组更少,并大大改善了 ChIP-seq 峰调用和下游分析。Greenscreen 过滤揭示了不同遗传背景或组织中真实因子结合的重叠和占有率变化。由于它只需要两个输入即可有效,因此 greenscreen 可以轻松适应任何物种或基因组构建。虽然 greenscreen 是为 ChIP-seq 开发的,但它也可以识别其他基因组数据集(包括靶向切割和核酸酶释放)中的人工信号。我们提出了一种改进的 ChIP-seq 管道,该管道包含 greenscreen,与其他方法相比,它可以检测到更多的真实峰。