Faculty of Mathematics, Informatics and Mechanics, Univeristy of Warsaw, 00-927 Warszawa, Poland.
Int J Mol Sci. 2021 Jul 29;22(15):8123. doi: 10.3390/ijms22158123.
The explosive development of next-generation sequencing-based technologies has allowed us to take an unprecedented look at many molecular signatures of the non-coding genome. In particular, the ChIP-seq (Chromatin ImmunoPrecipitation followed by sequencing) technique is now very commonly used to assess the proteins associated with different non-coding DNA regions genome-wide. While the analysis of such data related to transcription factor binding is relatively straightforward, many modified histone variants, such as H3K27me3, are very important for the process of gene regulation but are very difficult to interpret. We propose a novel method, called HERON (HiddEn MaRkov mOdel based peak calliNg), for genome-wide data analysis that is able to detect DNA regions enriched for a certain feature, even in difficult settings of weakly enriched long DNA domains. We demonstrate the performance of our method both on simulated and experimental data.
基于新一代测序技术的爆炸式发展,使我们能够以前所未有的方式观察非编码基因组的许多分子特征。特别是,ChIP-seq(染色质免疫沉淀 followed by sequencing)技术现在非常常用于评估与不同非编码 DNA 区域相关的全基因组蛋白。虽然分析与转录因子结合相关的此类数据相对简单,但许多修饰组蛋白变体(如 H3K27me3)对于基因调控过程非常重要,但非常难以解释。我们提出了一种称为 HERON(基于隐马尔可夫模型的峰调用)的新方法,用于全基因组数据分析,即使在富含长 DNA 域的困难环境中,也能够检测到富含特定特征的 DNA 区域。我们在模拟和实验数据上展示了我们方法的性能。