Capurso Daniel, Bengtsson Henrik, Segal Mark R
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA.
Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94158, USA.
Nucleic Acids Res. 2016 Mar 18;44(5):2028-35. doi: 10.1093/nar/gkw070. Epub 2016 Feb 10.
The spatial organization of the genome influences cellular function, notably gene regulation. Recent studies have assessed the three-dimensional (3D) co-localization of functional annotations (e.g. centromeres, long terminal repeats) using 3D genome reconstructions from Hi-C (genome-wide chromosome conformation capture) data; however, corresponding assessments for continuous functional genomic data (e.g. chromatin immunoprecipitation-sequencing (ChIP-seq) peak height) are lacking. Here, we demonstrate that applying bump hunting via the patient rule induction method (PRIM) to ChIP-seq data superposed on a Saccharomyces cerevisiae 3D genome reconstruction can discover 'functional 3D hotspots', regions in 3-space for which the mean ChIP-seq peak height is significantly elevated. For the transcription factor Swi6, the top hotspot by P-value contains MSB2 and ERG11 - known Swi6 target genes on different chromosomes. We verify this finding in a number of ways. First, this top hotspot is relatively stable under PRIM across parameter settings. Second, this hotspot is among the top hotspots by mean outcome identified by an alternative algorithm, k-Nearest Neighbor (k-NN) regression. Third, the distance between MSB2 and ERG11 is smaller than expected (by resampling) in two other 3D reconstructions generated via different normalization and reconstruction algorithms. This analytic approach can discover functional 3D hotspots and potentially reveal novel regulatory interactions.
基因组的空间组织影响细胞功能,尤其是基因调控。最近的研究使用来自Hi-C(全基因组染色体构象捕获)数据的三维(3D)基因组重建评估了功能注释(例如着丝粒、长末端重复序列)的三维共定位;然而,对于连续的功能基因组数据(例如染色质免疫沉淀测序(ChIP-seq)峰高)缺乏相应的评估。在这里,我们证明通过患者规则归纳法(PRIM)对叠加在酿酒酵母3D基因组重建上的ChIP-seq数据应用峰值搜索,可以发现“功能性3D热点”,即三维空间中ChIP-seq峰高平均值显著升高的区域。对于转录因子Swi6,按P值计算的顶级热点包含MSB2和ERG11,它们是位于不同染色体上的已知Swi6靶基因。我们通过多种方式验证了这一发现。首先,在PRIM下,这个顶级热点在不同参数设置下相对稳定。其次,这个热点在通过另一种算法k近邻(k-NN)回归确定的按平均结果计算的顶级热点之中。第三,在通过不同归一化和重建算法生成的另外两个3D重建中,MSB2和ERG11之间的距离小于预期(通过重采样)。这种分析方法可以发现功能性3D热点,并有可能揭示新的调控相互作用。