Inoue Fumitaka, Kircher Martin, Martin Beth, Cooper Gregory M, Witten Daniela M, McManus Michael T, Ahituv Nadav, Shendure Jay
Department of Bioengineering and Therapeutic Sciences, Institute for Human Genetics, University of California San Francisco, San Francisco, California 94158, USA.
Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
Genome Res. 2017 Jan;27(1):38-52. doi: 10.1101/gr.212092.116. Epub 2016 Nov 9.
Candidate enhancers can be identified on the basis of chromatin modifications, the binding of chromatin modifiers and transcription factors and cofactors, or chromatin accessibility. However, validating such candidates as bona fide enhancers requires functional characterization, typically achieved through reporter assays that test whether a sequence can increase expression of a transcriptional reporter via a minimal promoter. A longstanding concern is that reporter assays are mainly implemented on episomes, which are thought to lack physiological chromatin. However, the magnitude and determinants of differences in cis-regulation for regulatory sequences residing in episomes versus chromosomes remain almost completely unknown. To address this systematically, we developed and applied a novel lentivirus-based massively parallel reporter assay (lentiMPRA) to directly compare the functional activities of 2236 candidate liver enhancers in an episomal versus a chromosomally integrated context. We find that the activities of chromosomally integrated sequences are substantially different from the activities of the identical sequences assayed on episomes, and furthermore are correlated with different subsets of ENCODE annotations. The results of chromosomally based reporter assays are also more reproducible and more strongly predictable by both ENCODE annotations and sequence-based models. With a linear model that combines chromatin annotations and sequence information, we achieve a Pearson's R of 0.362 for predicting the results of chromosomally integrated reporter assays. This level of prediction is better than with either chromatin annotations or sequence information alone and also outperforms predictive models of episomal assays. Our results have broad implications for how cis-regulatory elements are identified, prioritized and functionally validated.
候选增强子可根据染色质修饰、染色质修饰因子以及转录因子和辅助因子的结合情况或染色质可及性来确定。然而,要验证这些候选者是否为真正的增强子,需要进行功能表征,通常是通过报告基因检测来实现,即测试一个序列是否能够通过最小启动子增加转录报告基因的表达。长期以来人们一直担心的是,报告基因检测主要是在游离基因上进行的,而游离基因被认为缺乏生理性染色质。然而,对于存在于游离基因与染色体上的调控序列,顺式调控差异的大小和决定因素几乎完全未知。为了系统地解决这个问题,我们开发并应用了一种基于慢病毒的新型大规模平行报告基因检测方法(慢病毒MPRA),以直接比较2236个候选肝脏增强子在游离基因环境与染色体整合环境中的功能活性。我们发现,染色体整合序列的活性与在游离基因上检测的相同序列的活性有很大不同,而且还与ENCODE注释的不同子集相关。基于染色体的报告基因检测结果在ENCODE注释和基于序列的模型中也更具可重复性和更强的可预测性。通过结合染色质注释和序列信息的线性模型,我们预测染色体整合报告基因检测结果的皮尔逊相关系数R为0.362。这种预测水平优于单独使用染色质注释或序列信息,也优于游离基因检测的预测模型。我们的结果对于顺式调控元件的识别、优先级确定和功能验证具有广泛的意义。