Wang Yingwei, Hill Kathleen, Singh Shiva, Kari Lila
Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7.
Gene. 2005 Feb 14;346:173-85. doi: 10.1016/j.gene.2004.10.021.
In the post genomic era, access to complete genome sequence data for numerous diverse species has opened multiple avenues for examining and comparing primary DNA sequence organization of entire genomes. Previously, the concept of a genomic signature was introduced with the observation of species-type specific Dinucleotide Relative Abundance Profiles (DRAPs); dinucleotides were identified as the subsequences with the greatest bias in representation in a majority of genomes. Herein, we demonstrate that DRAP is one particular genomic signature contained within a broader spectrum of signatures. Within this spectrum, an alternative genomic signature, Chaos Game Representation (CGR), provides a unique visualization of patterns in sequence organization. A genomic signature is associated with a particular integer order or subsequence length that represents a measure of the resolution or granularity in the analysis of primary DNA sequence organization. We quantitatively explore the organizational information provided by genomic signatures of different orders through different distance measures, including a novel Image Distance. The Image Distance and other existing distance measures are evaluated by comparing the phylogenetic trees they generate for 26 complete mitochondrial genomes from a diversity of species. The phylogenetic tree generated by the Image Distance is compatible with the known relatedness of species. Quantitative evaluation of the spectrum of genomic signatures may be used to ultimately gain insight into the determinants and biological relevance of the genome signatures.
在后基因组时代,获取众多不同物种的完整基因组序列数据为研究和比较整个基因组的初级DNA序列组织开辟了多种途径。此前,随着对物种类型特异性二核苷酸相对丰度谱(DRAPs)的观察,引入了基因组特征的概念;二核苷酸被确定为大多数基因组中代表性偏差最大的子序列。在此,我们证明DRAP是更广泛的特征谱中包含的一种特定基因组特征。在这个谱中,另一种基因组特征,混沌游戏表示(CGR),提供了序列组织模式的独特可视化。基因组特征与特定的整数阶或子序列长度相关联,该长度代表了在初级DNA序列组织分析中的分辨率或粒度的一种度量。我们通过不同的距离度量,包括一种新的图像距离,定量探索不同阶的基因组特征所提供的组织信息。通过比较它们为来自不同物种的26个完整线粒体基因组生成的系统发育树,对图像距离和其他现有距离度量进行了评估。由图像距离生成的系统发育树与已知的物种亲缘关系相匹配。对基因组特征谱的定量评估最终可用于深入了解基因组特征的决定因素和生物学相关性。