Wang Su, Zang Chongzhi, Xiao Tengfei, Fan Jingyu, Mei Shenglin, Qin Qian, Wu Qiu, Li Xujuan, Xu Kexin, He Housheng Hansen, Brown Myles, Meyer Clifford A, Liu X Shirley
Shanghai Key Laboratory of Tuberculosis, Shanghai Pulmonary Hospital, Shanghai, 200433, China; Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, 200092, China.
Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02215, USA; Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA.
Genome Res. 2016 Oct;26(10):1417-1429. doi: 10.1101/gr.201574.115. Epub 2016 Jul 27.
Model-based analysis of regulation of gene expression (MARGE) is a framework for interpreting the relationship between the H3K27ac chromatin environment and differentially expressed gene sets. The framework has three main functions: MARGE-potential, MARGE-express, and MARGE-cistrome. MARGE-potential defines a regulatory potential (RP) for each gene as the sum of H3K27ac ChIP-seq signals weighted by a function of genomic distance from the transcription start site. The MARGE framework includes a compendium of RPs derived from 365 human and 267 mouse H3K27ac ChIP-seq data sets. Relative RPs, scaled using this compendium, are superior to superenhancers in predicting BET (bromodomain and extraterminal domain) -inhibitor repressed genes. MARGE-express, which uses logistic regression to retrieve relevant H3K27ac profiles from the compendium to accurately model a query set of differentially expressed genes, was tested on 671 diverse gene sets from MSigDB. MARGE-cistrome adopts a novel semisupervised learning approach to identify cis-regulatory elements regulating a gene set. MARGE-cistrome exploits information from H3K27ac signal at DNase I hypersensitive sites identified from published human and mouse DNase-seq data. We tested the framework on newly generated RNA-seq and H3K27ac ChIP-seq profiles upon siRNA silencing of multiple transcriptional and epigenetic regulators in a prostate cancer cell line, LNCaP-abl. MARGE-cistrome can predict the binding sites of silenced transcription factors without matched H3K27ac ChIP-seq data. Even when the matching H3K27ac ChIP-seq profiles are available, MARGE leverages public H3K27ac profiles to enhance these data. This study demonstrates the advantage of integrating a large compendium of historical epigenetic data for genomic studies of transcriptional regulation.
基于模型的基因表达调控分析(MARGE)是一个用于解释H3K27ac染色质环境与差异表达基因集之间关系的框架。该框架有三个主要功能:MARGE-潜能、MARGE-表达和MARGE-顺式作用元件组。MARGE-潜能将每个基因的调控潜能(RP)定义为H3K27ac ChIP-seq信号的总和,该信号由距转录起始位点的基因组距离函数加权。MARGE框架包括一个来自365个人类和267个小鼠H3K27ac ChIP-seq数据集的RP汇编。使用该汇编进行缩放的相对RP在预测BET(溴结构域和额外末端结构域)抑制剂抑制的基因方面优于超级增强子。MARGE-表达使用逻辑回归从汇编中检索相关的H3K27ac图谱,以准确模拟差异表达基因的查询集,并在来自MSigDB的671个不同基因集上进行了测试。MARGE-顺式作用元件组采用一种新颖的半监督学习方法来识别调控基因集的顺式调控元件。MARGE-顺式作用元件组利用从已发表的人类和小鼠DNase-seq数据中识别出的DNase I超敏位点处的H3K27ac信号信息。我们在前列腺癌细胞系LNCaP-abl中对多个转录和表观遗传调节因子进行siRNA沉默后,在新生成的RNA-seq和H3K27ac ChIP-seq图谱上测试了该框架。MARGE-顺式作用元件组可以在没有匹配的H3K27ac ChIP-seq数据的情况下预测沉默转录因子的结合位点。即使有匹配的H3K27ac ChIP-seq图谱,MARGE也会利用公共的H3K27ac图谱来增强这些数据。这项研究证明了整合大量历史表观遗传数据用于转录调控基因组研究的优势。