Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong.
DNA Res. 2010 Jun;17(3):155-68. doi: 10.1093/dnares/dsq008. Epub 2010 Apr 1.
A genome space is a moduli space of genomes. In this space, each point corresponds to a genome. The natural distance between two points in the genome space reflects the biological distance between these two genomes. Currently, there is no method to represent genomes by a point in a space without losing biological information. Here, we propose a new graphical representation for DNA sequences. The breakthrough of the subject is that we can construct the moment vectors from DNA sequences using this new graphical method and prove that the correspondence between moment vectors and DNA sequences is one-to-one. Using these moment vectors, we have constructed a novel genome space as a subspace in R(N). It allows us to show that the SARS-CoV is most closely related to a coronavirus from the palm civet not from a bird as initially suspected, and the newly discovered human coronavirus HCoV-HKU1 is more closely related to SARS than to any other known member of group 2 coronavirus. Furthermore, we reconstructed the phylogenetic tree for 34 lentiviruses (including human immunodeficiency virus) based on their whole genome sequences. Our genome space will provide a new powerful tool for analyzing the classification of genomes and their phylogenetic relationships.
基因组空间是基因组的模空间。在这个空间中,每个点对应一个基因组。基因组空间中两点之间的自然距离反映了这两个基因组之间的生物学距离。目前,还没有一种方法可以在不丢失生物信息的情况下将基因组表示为空间中的一个点。在这里,我们提出了一种新的 DNA 序列图形表示方法。该课题的突破之处在于,我们可以使用这种新的图形方法从 DNA 序列中构建矩向量,并证明矩向量与 DNA 序列之间的对应关系是一一对应的。利用这些矩向量,我们构建了一个新型的基因组空间作为 R(N)中的子空间。它使我们能够表明,SARS-CoV 与最初怀疑的来自棕榈狸的冠状病毒关系最密切,而新发现的人类冠状病毒 HCoV-HKU1 与 SARS 的关系比与任何其他已知的 2 组冠状病毒成员的关系都要密切。此外,我们还根据 34 种慢病毒(包括人类免疫缺陷病毒)的全基因组序列重建了它们的系统发育树。我们的基因组空间将为分析基因组的分类和它们的系统发育关系提供一个新的强大工具。