Thurman Robert E, Day Nathan, Noble William S, Stamatoyannopoulos John A
Division of Medical Genetics, University of Washington, Seattle, Washington 98195, USA.
Genome Res. 2007 Jun;17(6):917-27. doi: 10.1101/gr.6081407.
It has long been posited that human and other large genomes are organized into higher-order (i.e., greater than gene-sized) functional domains. We hypothesized that diverse experimental data types generated by The ENCODE Project Consortium could be combined to delineate active and quiescent or repressed functional domains and thereby illuminate the higher-order functional architecture of the genome. To address this, we coupled wavelet analysis with hidden Markov models for unbiased discovery of "domain-level" behavior in high-resolution functional genomic data, including activating and repressive histone modifications, RNA output, and DNA replication timing. We find that higher-order patterns in these data types are largely concordant and may be analyzed collectively in the context of HeLa cells to delineate 53 active and 62 repressed functional domains within the ENCODE regions. Active domains comprise approximately 44% of the ENCODE regions but contain approximately 75%-80% of annotated genes, transcripts, and CpG islands. Repressed domains are enriched in certain classes of repetitive elements and, surprisingly, in evolutionarily conserved nonexonic sequences. The functional domain structure of the ENCODE regions appears to be largely stable across different cell types. Taken together, our results suggest that higher-order functional domains represent a fundamental organizing principle of human genome architecture.
长期以来,人们一直假定人类和其他大型基因组被组织成更高阶(即大于基因大小)的功能域。我们假设,由ENCODE计划联盟产生的各种实验数据类型可以结合起来,以描绘活跃和静止或受抑制的功能域,从而阐明基因组的高阶功能结构。为了解决这个问题,我们将小波分析与隐马尔可夫模型相结合,以便在高分辨率功能基因组数据中无偏地发现“域级”行为,包括激活和抑制性组蛋白修饰、RNA输出和DNA复制时间。我们发现,这些数据类型中的高阶模式在很大程度上是一致的,并且可以在HeLa细胞的背景下进行综合分析,以描绘ENCODE区域内的53个活跃和62个受抑制的功能域。活跃域约占ENCODE区域的44%,但包含约75%-80%的注释基因、转录本和CpG岛。受抑制域在某些重复元件类别中富集,令人惊讶的是,在进化上保守的非外显子序列中也有富集。ENCODE区域的功能域结构在不同细胞类型中似乎基本稳定。综上所述,我们的结果表明,高阶功能域代表了人类基因组结构的一个基本组织原则。