Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA.
Bioinformatics. 2013 May 1;29(9):1199-205. doi: 10.1093/bioinformatics/btt126. Epub 2013 Mar 19.
Histone modifications regulate chromatin structure and gene expression. Although nucleosome formation is known to be affected by primary DNA sequence composition, no sequence signature has been identified for histone modifications. It is known that dense H3K4me3 nucleosome sites are accompanied by a low density of other nucleosomes and are associated with gene activation. This observation suggests a different sequence composition of H3K4me3 from other nucleosomes.
To understand the relationship between genome sequence and chromatin structure, we studied DNA sequences at histone modification sites in various human cell types. We found sequence specificity for H3K4me3, but not for other histone modifications. Using the sequence specificities of H3 and H3K4me3 nucleosomes, we developed a model that computes the probability of H3K4me3 occupation at each base pair from the genome sequence context.
A comparison of our predictions with experimental data suggests a high performance of our method, revealing a strong association between H3K4me3 and specific genomic DNA context. The high probability of H3K4me3 occupation occurs at transcription start and termination sites, exon boundaries and binding sites of transcription regulators involved in chromatin modification activities, including histone acetylases and enhancer- and insulator-associated factors. Thus, the human genome sequence contains signatures for chromatin modifications essential for gene regulation and development. Our method may be applied to find new sequence elements functioning by chromatin modulation.
Software and supplementary data are available at Bioinformatics online.
组蛋白修饰调节染色质结构和基因表达。尽管已知核小体形成受主要 DNA 序列组成的影响,但尚未确定组蛋白修饰的序列特征。已知密集的 H3K4me3 核小体位点伴随着其他核小体的低密度,并且与基因激活相关。这一观察结果表明 H3K4me3 与其他核小体具有不同的序列组成。
为了了解基因组序列与染色质结构之间的关系,我们研究了各种人类细胞类型中组蛋白修饰位点的 DNA 序列。我们发现 H3K4me3 具有序列特异性,但其他组蛋白修饰则没有。利用 H3 和 H3K4me3 核小体的序列特异性,我们开发了一种模型,该模型可以根据基因组序列上下文计算每个碱基对中 H3K4me3 占据的概率。
我们的预测与实验数据的比较表明,我们的方法具有很高的性能,揭示了 H3K4me3 与特定基因组 DNA 背景之间的强烈关联。H3K4me3 占据的高概率发生在转录起始和终止位点、外显子边界以及参与染色质修饰活动的转录调节剂的结合位点,包括组蛋白乙酰转移酶以及增强子和绝缘子相关因子。因此,人类基因组序列包含对基因调控和发育至关重要的染色质修饰的特征。我们的方法可用于发现通过染色质调节起作用的新序列元件。
软件和补充数据可在 Bioinformatics 在线获得。