Institute for Systems Biology, Seattle, Washington, USA.
1] Seattle Biomedical Research Institute, Seattle, Washington, USA. [2].
Nat Methods. 2014 Jun;11(6):689-94. doi: 10.1038/nmeth.2924. Epub 2014 Apr 13.
Genomic information is encoded on a wide range of distance scales, ranging from tens of bases to megabases. We developed a multiscale framework to analyze and visualize the information content of genomic signals. Different types of signals, such as G+C content or DNA methylation, are characterized by distinct patterns of signal enrichment or depletion across scales spanning several orders of magnitude. These patterns are associated with a variety of genomic annotations. By integrating the information across all scales, we demonstrated improved prediction of gene expression from polymerase II chromatin immunoprecipitation sequencing (ChIP-seq) measurements, and we observed that gene expression differences in colorectal cancer are related to methylation patterns that extend beyond the single-gene scale. Our software is available at https://github.com/tknijnen/msr/.
基因组信息编码在广泛的距离尺度上,从数十个碱基到数百万个碱基不等。我们开发了一个多尺度框架来分析和可视化基因组信号的信息含量。不同类型的信号,如 G+C 含量或 DNA 甲基化,在跨越几个数量级的尺度上表现出不同的信号富集或缺失模式。这些模式与各种基因组注释有关。通过整合所有尺度的信息,我们证明了从聚合酶 II 染色质免疫沉淀测序 (ChIP-seq) 测量中提高了基因表达的预测能力,并且我们观察到结直肠癌中的基因表达差异与超出单个基因尺度的甲基化模式有关。我们的软件可在 https://github.com/tknijnen/msr/ 获得。