Gómez-Martín Cristina, Lebrón Ricardo, Oliver José L, Hackenberg Michael
Department of Genetics, Faculty of Science, University of Granada, Granada, Spain.
Lab. de Bioinformática, Centro de Investigación Biomédica, PTS, Instituto de Biotecnología, Granada, Spain.
Methods Mol Biol. 2018;1766:31-47. doi: 10.1007/978-1-4939-7768-0_3.
The promoter region of around 70% of all genes in the human genome is overlapped by a CpG island (CGI). CGIs have known functions in the transcription initiation and outstanding compositional features like high G+C content and CpG ratios when compared to the bulk DNA. We have shown before that CGIs manifest as clusters of CpGs in mammalian genomes and can therefore be detected using clustering methods. These techniques have several advantages over sliding window approaches which apply compositional properties as thresholds. In this protocol we show how to determine local (CpG islands) and global (distance distribution) clustering properties of CG dinucleotides and how to generalize this analysis to any k-mer or combinations of it. In addition, we illustrate how to easily cross the output of a CpG island prediction algorithm with our methylation database to detect differentially methylated CGIs. The analysis is given in a step-by-step protocol and all necessary programs are implemented into a virtual machine or, alternatively, the software can be downloaded and easily installed.
人类基因组中约70%的基因启动子区域与一个CpG岛(CGI)重叠。与整体DNA相比,CGI在转录起始中具有已知功能,并且具有突出的组成特征,如高G+C含量和CpG比率。我们之前已经表明,CGI在哺乳动物基因组中表现为CpG簇,因此可以使用聚类方法进行检测。与将组成特性用作阈值的滑动窗口方法相比,这些技术具有几个优点。在本方案中,我们展示了如何确定CG二核苷酸的局部(CpG岛)和全局(距离分布)聚类特性,以及如何将此分析推广到任何k-mer或其组合。此外,我们说明了如何轻松地将CpG岛预测算法的输出与我们的甲基化数据库交叉,以检测差异甲基化的CGI。该分析以逐步方案给出,所有必要的程序都已在虚拟机中实现,或者也可以下载并轻松安装该软件。