Cluster of Excellence for Multimodal Computing and Interaction, Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany.
Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany.
Nucleic Acids Res. 2021 Oct 11;49(18):10397-10418. doi: 10.1093/nar/gkab798.
Understanding how epigenetic variation in non-coding regions is involved in distal gene-expression regulation is an important problem. Regulatory regions can be associated to genes using large-scale datasets of epigenetic and expression data. However, for regions of complex epigenomic signals and enhancers that regulate many genes, it is difficult to understand these associations. We present StitchIt, an approach to dissect epigenetic variation in a gene-specific manner for the detection of regulatory elements (REMs) without relying on peak calls in individual samples. StitchIt segments epigenetic signal tracks over many samples to generate the location and the target genes of a REM simultaneously. We show that this approach leads to a more accurate and refined REM detection compared to standard methods even on heterogeneous datasets, which are challenging to model. Also, StitchIt REMs are highly enriched in experimentally determined chromatin interactions and expression quantitative trait loci. We validated several newly predicted REMs using CRISPR-Cas9 experiments, thereby demonstrating the reliability of StitchIt. StitchIt is able to dissect regulation in superenhancers and predicts thousands of putative REMs that go unnoticed using peak-based approaches suggesting that a large part of the regulome might be uncharted water.
理解非编码区域的表观遗传变异如何参与远端基因表达调控是一个重要的问题。可以使用大规模的表观遗传和表达数据集将调控区域与基因相关联。然而,对于具有复杂表观基因组信号和调节许多基因的增强子的区域,很难理解这些关联。我们提出了 StitchIt,这是一种无需依赖单个样本中的峰调用即可以基因特异性方式剖析基因特异性表观遗传变异以检测调节元件 (REM) 的方法。StitchIt 跨多个样本分割表观遗传信号轨迹,同时生成 REM 的位置和靶基因。我们表明,与标准方法相比,即使在难以建模的异构数据集上,这种方法也能更准确、更精细地检测 REM。此外,StitchIt 的 REMs 在实验确定的染色质相互作用和表达数量性状基因座中高度富集。我们使用 CRISPR-Cas9 实验验证了几个新预测的 REM,从而证明了 StitchIt 的可靠性。StitchIt 能够剖析超级增强子的调控作用,并预测了数千个使用基于峰的方法无法检测到的假定 REM,这表明调控组的很大一部分可能是未知领域。