Han Leng, Zhao Zhongming
Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA.
BMC Bioinformatics. 2009 Feb 20;10:65. doi: 10.1186/1471-2105-10-65.
CpG islands (CGIs), clusters of CpG dinucleotides in GC-rich regions, are often located in the 5' end of genes and considered gene markers. Hackenberg et al. (2006) recently developed a new algorithm, CpGcluster, which uses a completely different mathematical approach from previous traditional algorithms. Their evaluation suggests that CpGcluster provides a much more efficient approach to detecting functional clusters or islands of CpGs.
We systematically compared CpGcluster with the traditional algorithm by Takai and Jones (2002). Our comparisons of (1) the number of islands versus the number of genes in a genome, (2) the distribution of islands in different genomic regions, (3) island length, (4) the distance between two neighboring islands, and (5) methylation status suggest that Takai and Jones' algorithm is overall more appropriate for identifying promoter-associated islands of CpGs in vertebrate genomes.
The generation of genome sequence and DNA methylation data is expected to accelerate greatly. The information in this study is important for its extensive utility in gene feature analysis and epigenomics including gene prediction and methylation chip design in different genomes.
CpG岛(CGIs)是富含GC区域中的CpG二核苷酸簇,通常位于基因的5'端,被视为基因标记。哈肯贝格等人(2006年)最近开发了一种新算法CpGcluster,它采用了与以往传统算法完全不同的数学方法。他们的评估表明,CpGcluster为检测CpG的功能簇或岛提供了一种更为有效的方法。
我们将CpGcluster与高井和琼斯(2002年)的传统算法进行了系统比较。我们对(1)基因组中岛的数量与基因数量、(2)岛在不同基因组区域的分布、(3)岛的长度、(4)两个相邻岛之间的距离以及(5)甲基化状态的比较表明,高井和琼斯的算法总体上更适合于识别脊椎动物基因组中与启动子相关的CpG岛。
预计基因组序列和DNA甲基化数据的生成将大大加速。本研究中的信息对于其在基因特征分析和表观基因组学中的广泛应用非常重要,包括不同基因组中的基因预测和甲基化芯片设计。