Nordin Anna, Pagella Pierfrancesco, Zambanini Gianluca, Cantù Claudio
Wallenberg Centre for Molecular Medicine, Linköping University, Linköping, Sweden.
Department of Biomedical and Clinical Sciences, Division of Molecular Medicine and Virology, Faculty of Medicine and Health Sciences, Linköping University, Linköping, Sweden.
Nucleic Acids Res. 2024 Apr 24;52(7):e40. doi: 10.1093/nar/gkae180.
Genome-wide binding assays aspire to map the complete binding pattern of gene regulators. Common practice relies on replication-duplicates or triplicates-and high stringency statistics to favor false negatives over false positives. Here we show that duplicates and triplicates of CUT&RUN are not sufficient to discover the entire activity of transcriptional regulators. We introduce ICEBERG (Increased Capture of Enrichment By Exhaustive Replicate aGgregation), a pipeline that harnesses large numbers of CUT&RUN replicates to discover the full set of binding events and chart the line between false positives and false negatives. We employed ICEBERG to map the full set of H3K4me3-marked regions, the targets of the co-factor β-catenin, and those of the transcription factor TBX3, in human colorectal cancer cells. The ICEBERG datasets allow benchmarking of individual replicates, comparing the performance of peak calling and replication approaches, and expose the arbitrary nature of strategies to identify reproducible peaks. Instead of a static view of genomic targets, ICEBERG establishes a spectrum of detection probabilities across the genome for a given factor, underlying the intrinsic dynamicity of its mechanism of action, and permitting to distinguish frequent from rare regulation events. Finally, ICEBERG discovered instances, undetectable with other approaches, that underlie novel mechanisms of colorectal cancer progression.
全基因组结合分析旨在绘制基因调控因子的完整结合模式。常见做法依赖于重复实验(双份或三份重复)以及高严格度统计,以优先考虑假阴性而非假阳性。在这里,我们表明CUT&RUN的双份和三份重复不足以发现转录调节因子的全部活性。我们引入了ICEBERG(通过详尽重复聚集增加富集捕获),这是一种利用大量CUT&RUN重复来发现全套结合事件并划分假阳性和假阴性界限的流程。我们使用ICEBERG在人结肠癌细胞中绘制了全套H3K4me3标记区域、辅因子β-连环蛋白的靶标以及转录因子TBX3的靶标。ICEBERG数据集允许对单个重复进行基准测试,比较峰检测和重复方法的性能,并揭示识别可重复峰的策略的任意性。ICEBERG不是提供基因组靶标的静态视图,而是为给定因子在全基因组范围内建立一个检测概率谱,这揭示了其作用机制的内在动态性,并允许区分频繁和罕见的调控事件。最后,ICEBERG发现了其他方法无法检测到的实例,这些实例构成了结肠癌进展的新机制。