Wallenberg Centre for Molecular Medicine, Linköping University, Linköping, Sweden.
Department of Biomedical and Clinical Sciences, Division of Molecular Medicine and Virology, Faculty of Medicine and Health Sciences, Linköping University, Linköping, Sweden.
Genome Biol. 2023 Aug 10;24(1):185. doi: 10.1186/s13059-023-03027-3.
Cleavage Under Targets and Release Using Nuclease (CUT&RUN) is an increasingly popular technique to map genome-wide binding profiles of histone modifications, transcription factors, and co-factors. The ENCODE project and others have compiled blacklists for ChIP-seq which have been widely adopted: these lists contain regions of high and unstructured signal, regardless of cell type or protein target, indicating that these are false positives. While CUT&RUN obtains similar results to ChIP-seq, its biochemistry and subsequent data analyses are different. We found that this results in a CUT&RUN-specific set of undesired high-signal regions.
We compile suspect lists based on CUT&RUN data for the human and mouse genomes, identifying regions consistently called as peaks in negative controls. Using published CUT&RUN data from our and other labs, we show that the CUT&RUN suspect regions can persist even when peak calling is performed with SEACR or MACS2 against a negative control and after ENCODE blacklist removal. Moreover, we experimentally validate the CUT&RUN suspect lists by performing reiterative negative control experiments in which no specific protein is targeted, showing that they capture more than 80% of the peaks identified.
We propose that removing these problematic regions can substantially improve peak calling in CUT&RUN experiments, resulting in more reliable datasets.
靶向切割和核酸酶释放(CUT&RUN)是一种越来越流行的技术,用于绘制组蛋白修饰、转录因子和共因子的全基因组结合图谱。ENCODE 项目和其他项目已经为 ChIP-seq 编制了黑名单,这些黑名单被广泛采用:这些列表包含高信号和非结构化信号的区域,无论细胞类型或蛋白质靶标如何,这表明这些是假阳性。虽然 CUT&RUN 获得了与 ChIP-seq 相似的结果,但它的生物化学和随后的数据分析是不同的。我们发现,这导致了一组独特的不想要的高信号区域。
我们根据人类和小鼠基因组的 CUT&RUN 数据编制了可疑列表,确定了在阴性对照中一致被称为峰的区域。使用我们和其他实验室发表的 CUT&RUN 数据,我们表明,即使使用 SEACR 或 MACS2 针对阴性对照进行峰调用,并且去除了 ENCODE 黑名单后,CUT&RUN 可疑区域仍然存在。此外,我们通过重复进行无特定蛋白质靶向的阴性对照实验来实验验证 CUT&RUN 可疑列表,结果表明它们捕获了超过 80%的鉴定峰。
我们提出,去除这些有问题的区域可以显著提高 CUT&RUN 实验中的峰调用,从而产生更可靠的数据集。