Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA.
BMC Genomics. 2009 Dec 23;10:630. doi: 10.1186/1471-2164-10-630.
Accurate determination of orthology is central to comparative genomics. For vertebrates in particular, very large gene families, high rates of gene duplication and loss, multiple mechanisms of gene duplication, and high rates of retrotransposition all combine to make inference of orthology between genes difficult. Many methods have been developed to identify orthologous genes, mostly based upon analysis of the inferred protein sequence of the genes. More recently, methods have been proposed that use genomic context in addition to protein sequence to improve orthology assignment in vertebrates. Such methods have been most successfully implemented in fungal genomes and have long been used in prokaryotic genomes, where gene order is far less variable than in vertebrates. However, to our knowledge, no explicit comparison of synteny and sequence based definitions of orthology has been reported in vertebrates, or, more specifically, in mammals.
We test a simple method for the measurement and utilization of gene order (local synteny) in the identification of mammalian orthologs by investigating the agreement between coding sequence based orthology (Inparanoid) and local synteny based orthology. In the 5 mammalian genomes studied, 93% of the sampled inter-species pairs were found to be concordant between the two orthology methods, illustrating that local synteny is a robust substitute to coding sequence for identifying orthologs. However, 7% of pairs were found to be discordant between local synteny and Inparanoid. These cases of discordance result from evolutionary events including retrotransposition and genome rearrangements.
By analyzing cases of discordance between local synteny and Inparanoid we show that local synteny can distinguish between true orthologs and recent retrogenes, can resolve ambiguous many-to-many orthology relationships into one-to-one ortholog pairs, and might be used to identify cases of non-orthologous gene displacement by retroduplicated paralogs.
准确确定同源性是比较基因组学的核心。特别是对于脊椎动物来说,非常大的基因家族、基因重复和丢失的高速度、多种基因重复机制以及逆转录转座的高速度都使得推断基因之间的同源性变得困难。已经开发出许多方法来识别同源基因,这些方法主要基于对基因推断的蛋白质序列的分析。最近,提出了一些方法,这些方法除了蛋白质序列外,还利用基因组上下文来提高脊椎动物的同源性分配。这种方法在真菌基因组中得到了最成功的实施,并在原核基因组中得到了长期的应用,在原核基因组中,基因顺序的变化远小于脊椎动物。然而,据我们所知,在脊椎动物中,或者更具体地说,在哺乳动物中,还没有报告过对基因顺序和基于序列的同源性定义进行明确比较。
我们通过研究编码序列基于同源性(Inparanoid)和基于局部顺序的同源性之间的一致性,测试了一种用于识别哺乳动物同源基因的基因顺序(局部顺序)的测量和利用的简单方法。在所研究的 5 种哺乳动物基因组中,两种同源性方法发现 93%的物种间样本对是一致的,这表明局部顺序是一种可靠的替代编码序列来识别同源基因的方法。然而,7%的样本对在局部顺序和 Inparanoid 之间是不一致的。这些不一致的情况是由进化事件引起的,包括逆转录和基因组重排。
通过分析局部顺序和 Inparanoid 之间的不一致情况,我们表明局部顺序可以区分真正的同源基因和最近的反转录基因,将模糊的多对多的同源关系解析为一对一的同源基因对,并且可以用于识别由复制的同源基因对引起的非同源基因替换。