Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02215, USA.
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology and Broad Institute, Cambridge, Massachusetts 02139, USA.
Nat Commun. 2017 Apr 7;8:15011. doi: 10.1038/ncomms15011.
Chromatin-state analysis is widely applied in the studies of development and diseases. However, existing methods operate at a single length scale, and therefore cannot distinguish large domains from isolated elements of the same type. To overcome this limitation, we present a hierarchical hidden Markov model, diHMM, to systematically annotate chromatin states at multiple length scales. We apply diHMM to analyse a public ChIP-seq data set. diHMM not only accurately captures nucleosome-level information, but identifies domain-level states that vary in nucleosome-level state composition, spatial distribution and functionality. The domain-level states recapitulate known patterns such as super-enhancers, bivalent promoters and Polycomb repressed regions, and identify additional patterns whose biological functions are not yet characterized. By integrating chromatin-state information with gene expression and Hi-C data, we identify context-dependent functions of nucleosome-level states. Thus, diHMM provides a powerful tool for investigating the role of higher-order chromatin structure in gene regulation.
染色质状态分析被广泛应用于发育和疾病的研究。然而,现有的方法只能在单一长度尺度上进行操作,因此无法区分同一类型的大域和孤立元件。为了克服这一局限性,我们提出了一种层次隐马尔可夫模型(diHMM),以便在多个长度尺度上系统地注释染色质状态。我们应用 diHMM 分析了一个公开的 ChIP-seq 数据集。diHMM 不仅能准确地捕捉到核小体水平的信息,还能识别在核小体水平组成、空间分布和功能上存在差异的域水平状态。这些域水平状态再现了已知的模式,如超级增强子、双价启动子和多梳抑制区域,并确定了其他功能尚未确定的模式。通过将染色质状态信息与基因表达和 Hi-C 数据相结合,我们确定了核小体水平状态的上下文相关功能。因此,diHMM 为研究更高阶染色质结构在基因调控中的作用提供了有力的工具。