Wang Yong, Jiang Rui, Wong Wing Hung
Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, CA 94305, USA.
Academy of Mathematics and Systems Science, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing 100080, China.
Natl Sci Rev. 2016 Jun;3(2):240-251. doi: 10.1093/nsr/nww025. Epub 2016 Apr 19.
Cell packs a lot of genetic and regulatory information through a structure known as chromatin, i.e. DNA is wrapped around histone proteins and is tightly packed in a remarkable way. To express a gene in a specific coding region, the chromatin would open up and DNA loop may be formed by interacting enhancers and promoters. Furthermore, the mediator and cohesion complexes, sequence-specific transcription factors, and RNA polymerase II are recruited and work together to elaborately regulate the expression level. It is in pressing need to understand how the information, about when, where, and to what degree genes should be expressed, is embedded into chromatin structure and gene regulatory elements. Thanks to large consortia such as Encyclopedia of DNA Elements (ENCODE) and Roadmap Epigenomic projects, extensive data on chromatin accessibility and transcript abundance are available across many tissues and cell types. This rich data offer an exciting opportunity to model the causal regulatory relationship. Here, we will review the current experimental approaches, foundational data, computational problems, interpretive frameworks, and integrative models that will enable the accurate interpretation of regulatory landscape. Particularly, we will discuss the efforts to organize, analyze, model, and integrate the DNA accessibility data, transcriptional data, and functional genomic regions together. We believe that these efforts will eventually help us understand the information flow within the cell and will influence research directions across many fields.
细胞通过一种称为染色质的结构来存储大量的遗传和调控信息,即DNA缠绕在组蛋白上,并以一种非凡的方式紧密包装。为了在特定的编码区域表达基因,染色质会打开,DNA环可能由相互作用的增强子和启动子形成。此外,中介体和黏连蛋白复合物、序列特异性转录因子以及RNA聚合酶II会被招募并共同作用,以精细地调节表达水平。迫切需要了解关于基因何时、何地以及在何种程度上应该表达的信息是如何嵌入染色质结构和基因调控元件中的。多亏了诸如DNA元件百科全书(ENCODE)和表观基因组路线图计划等大型合作项目,现在可以获得许多组织和细胞类型中关于染色质可及性和转录本丰度的广泛数据。这些丰富的数据为建立因果调控关系模型提供了一个令人兴奋的机会。在这里,我们将回顾当前的实验方法、基础数据、计算问题、解释框架和整合模型,这些将有助于准确解释调控格局。特别是,我们将讨论将DNA可及性数据、转录数据和功能基因组区域进行组织、分析、建模和整合的工作。我们相信,这些努力最终将帮助我们理解细胞内的信息流,并将影响许多领域的研究方向。