Suppr超能文献

Poly-Enrich:基于计数的基因组区域基因集富集测试方法。

Poly-Enrich: count-based methods for gene set enrichment testing with genomic regions.

作者信息

Lee Christopher T, Cavalcante Raymond G, Lee Chee, Qin Tingting, Patil Snehal, Wang Shuze, Tsai Zing T Y, Boyle Alan P, Sartor Maureen A

机构信息

Biostatistics Department, University of Michigan, Ann Arbor, MI 48109, USA.

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

NAR Genom Bioinform. 2020 Mar;2(1):lqaa006. doi: 10.1093/nargab/lqaa006. Epub 2020 Feb 6.

Abstract

Gene set enrichment (GSE) testing enhances the biological interpretation of ChIP-seq data and other large sets of genomic regions. Our group has previously introduced two GSE methods for genomic regions: ChIP-Enrich for narrow regions and Broad-Enrich for broad regions. Here, we introduce Poly-Enrich, which has wider applicability, additional capabilities and models the number of peaks assigned to a gene using a generalized additive model with a negative binomial family to determine gene set enrichment, while adjusting for gene locus length. As opposed to ChIP-Enrich, Poly-Enrich works well even when nearly all genes have a peak, illustrated by using Poly-Enrich to characterize pathways and types of genic regions enriched with different families of repetitive elements. By comparing Poly-Enrich and ChIP-Enrich results with ENCODE ChIP-seq data, we found that the optimal test depends more on the pathway being regulated than on properties of the transcription factors. Using known transcription factor functions, we discovered clusters of related biological processes consistently better modeled with Poly-Enrich. This suggests that the regulation of certain processes may be modified by multiple binding events, better modeled by a count-based method. Our new hybrid method automatically uses the optimal method for each gene set, with correct FDR-adjustment.

摘要

基因集富集(GSE)测试增强了ChIP-seq数据和其他大量基因组区域的生物学解释。我们团队之前已经为基因组区域引入了两种GSE方法:用于狭窄区域的ChIP-Enrich和用于宽泛区域的Broad-Enrich。在此,我们介绍Poly-Enrich,它具有更广泛的适用性、更多的功能,并且使用具有负二项分布族的广义相加模型对分配给一个基因的峰数量进行建模,以确定基因集富集,同时对基因座长度进行调整。与ChIP-Enrich不同,即使几乎所有基因都有一个峰时,Poly-Enrich也能很好地发挥作用,通过使用Poly-Enrich来表征富含不同重复元件家族的基因区域的通路和类型可以说明这一点。通过将Poly-Enrich和ChIP-Enrich的结果与ENCODE ChIP-seq数据进行比较,我们发现最佳测试更多地取决于所调控的通路,而不是转录因子的特性。利用已知的转录因子功能,我们发现与Poly-Enrich一致的相关生物学过程的簇能得到更好的建模。这表明某些过程的调控可能会被多个结合事件所改变,通过基于计数的方法能得到更好的建模。我们新的混合方法会自动为每个基因集使用最佳方法,并进行正确的FDR校正。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7b8/7671343/96d91902571f/lqaa006fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验