Lee Christopher, Wang Kai, Qin Tingting, Sartor Maureen A
Department of Computational Medicine and Bioinformatics, School of Medicine, University of Michigan, Ann Arbor, MI, United States.
Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, United States.
Front Genet. 2020 Mar 6;11:199. doi: 10.3389/fgene.2020.00199. eCollection 2020.
Large sets of genomic regions are generated by the initial analysis of various genome-wide sequencing data, such as ChIP-seq and ATAC-seq experiments. Gene set enrichment (GSE) methods are commonly employed to determine the pathways associated with them. Given the pathways and other gene sets (e.g., GO terms) of significance, it is of great interest to know the extent to which each is driven by binding near transcription start sites (TSS) or near enhancers. Currently, no tool performs such an analysis. Here, we present a method that addresses this question to complement GSE methods for genomic regions. Specifically, the new method tests whether the genomic regions in a gene set are significantly closer to a TSS (or to an enhancer) than expected by chance given the total list of genomic regions, using a non-parametric test. Combining the results from a GSE test with our novel method provides additional information regarding the mode of regulation of each pathway, and additional evidence that the pathway is truly enriched. We illustrate our new method with a large set of ENCODE ChIP-seq data, using the Bioconductor package. The results show that our method is a powerful complementary approach to help researchers interpret large sets of genomic regions.
通过对各种全基因组测序数据(如ChIP-seq和ATAC-seq实验)的初步分析,可生成大量基因组区域。基因集富集(GSE)方法通常用于确定与之相关的通路。鉴于具有显著意义的通路和其他基因集(如GO术语),了解它们各自在多大程度上受转录起始位点(TSS)附近或增强子附近的结合驱动是非常有意义的。目前,尚无工具能进行此类分析。在此,我们提出一种方法来解决这个问题,以补充针对基因组区域的GSE方法。具体而言,新方法使用非参数检验来测试基因集中的基因组区域与TSS(或增强子)的距离是否显著比基于基因组区域总列表随机预期的更近。将GSE测试结果与我们的新方法相结合,可提供有关各通路调控模式的额外信息,以及该通路真正富集的额外证据。我们使用Bioconductor软件包,通过大量ENCODE ChIP-seq数据来说明我们的新方法。结果表明,我们的方法是一种强大的补充方法,可帮助研究人员解释大量基因组区域。