Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba 277-8561, Japan.
Nucleic Acids Res. 2012 Aug;40(14):6435-48. doi: 10.1093/nar/gks354. Epub 2012 May 4.
Due to dramatic advances in DNA technology, quantitative measures of annotation data can now be obtained in continuous coordinates across the entire genome, allowing various heterogeneous 'genomic landscapes' to emerge. Although much effort has been devoted to comparing DNA sequences, not much attention has been given to comparing these large quantities of data comprehensively. In this article, we introduce a method for rapidly detecting local regions that show high correlations between genomic landscapes. We overcame the size problem for genome-wide data by converting the data into series of symbols and then carrying out sequence alignment. We also decomposed the oscillation of the landscape data into different frequency bands before analysis, since the real genomic landscape is a mixture of embedded and confounded biological processes working at different scales in the cell nucleus. To verify the usefulness and generality of our method, we applied our approach to well investigated landscapes from the human genome, including several histone modifications. Furthermore, by applying our method to over 20 genomic landscapes in human and 12 in mouse, we found that DNA replication timing and the density of Alu insertions are highly correlated genome-wide in both species, even though the Alu elements have amplified independently in the two genomes. To our knowledge, this is the first method to align genomic landscapes at multiple scales according to their shape.
由于 DNA 技术的飞速发展,现在可以在整个基因组的连续坐标上获得注释数据的定量测量,从而出现了各种异构的“基因组景观”。尽管人们已经投入了大量精力来比较 DNA 序列,但对于全面比较这些大量数据的关注却很少。在本文中,我们介绍了一种快速检测基因组景观之间具有高度相关性的局部区域的方法。我们通过将数据转换为一系列符号,然后进行序列比对,克服了全基因组数据的大小问题。在分析之前,我们还将景观数据的波动分解为不同的频带,因为真实的基因组景观是在细胞核中不同尺度上工作的嵌入式和混淆的生物过程的混合物。为了验证我们方法的有用性和通用性,我们将我们的方法应用于人类基因组中经过充分研究的景观,包括几种组蛋白修饰。此外,通过将我们的方法应用于人类的 20 多个基因组景观和老鼠的 12 个基因组景观,我们发现,即使在两个基因组中,Alu 元件已经独立扩增,DNA 复制时间和 Alu 插入密度在这两个物种中都是全基因组高度相关的。据我们所知,这是第一种根据形状在多个尺度上对齐基因组景观的方法。