De Grassi Anna, Lanave Cecilia, Saccone Cecilia
Istituto di Tecnologie Biomediche, Sede di Bari, CNR, Bari, Italy.
Gene. 2008 Sep 15;421(1-2):1-6. doi: 10.1016/j.gene.2008.05.011. Epub 2008 Jun 23.
DNA duplication is one of the main forces acting on the evolution of organisms because it creates the raw genetic material that natural selection can subsequently modify. Duplicated regions are mainly due to "errors" in different phases of meiosis, but DNA transposable elements and reverse transcription also contribute to amplify and move the genomic material to different genomic locations. As a result, redundancy affects genomes to variable degrees: from the single gene to the whole genome (WGD). Gene families are clusters of genes created by duplication and their size reflects the number of duplicated genes, called paralogs, in each species. The aim of this review is to describe the state of the art in the identification and analysis of gene families in eukaryotes, with specific attention to those generated by ancient large scale events in vertebrates (WGD or large segmental duplications). As a case study, we report our work on the evolution of gene families encoding subunits of the five OXPHOS (oxidative phosphorylation) complexes, fundamental and highly conserved in all respiring cells. Although OXPHOS gene families are smaller than the general trend in nuclear gene families, some exceptions are observed, such as three gene families with at least two paralogs in vertebrates. These gene families encode cytochrome c (Cyt c, the electron shuttle protein between complex III and IV), Lipid Binding Protein (LBP, the channel protein of complex V which transfers protons through the inner mitochondrial membrane) and the MLRQ subunit (MLRQ, a supernumerary subunit of the large complex I, with unknown function). We provide a two-step approach, based on structural genomic data, to demonstrate that these gene families should have arisen through WGD (or large segmental duplication) events at the origin of vertebrates and, only afterwards, underwent species-specific events of further gene duplications and loss. In summary, this review reflects the need to apply genome comparative approaches, deriving from both "classical" molecular phylogenetic analysis and "new" genome map analysis, to successfully define the complex evolutionary relations between gene family members which, in turn, are essential to obtain any other comparative phylogenetic or functional results.
DNA复制是作用于生物体进化的主要力量之一,因为它产生了自然选择随后可以修饰的原始遗传物质。重复区域主要是由于减数分裂不同阶段的“错误”,但DNA转座元件和逆转录也有助于扩增基因组物质并将其移动到不同的基因组位置。因此,冗余在不同程度上影响基因组:从单个基因到整个基因组(全基因组复制,WGD)。基因家族是由复制产生的基因簇,其大小反映了每个物种中重复基因(称为旁系同源基因)的数量。本综述的目的是描述真核生物中基因家族鉴定和分析的现状,特别关注脊椎动物中由古代大规模事件(全基因组复制或大片段重复)产生的基因家族。作为一个案例研究,我们报告了我们关于编码五种氧化磷酸化(OXPHOS)复合物亚基的基因家族进化的研究工作,这些复合物在所有呼吸细胞中都是基本且高度保守的。虽然OXPHOS基因家族比核基因家族的一般趋势要小,但也观察到一些例外情况,例如在脊椎动物中有三个至少有两个旁系同源基因的基因家族。这些基因家族编码细胞色素c(Cyt c,复合物III和IV之间的电子穿梭蛋白)、脂质结合蛋白(LBP,复合物V的通道蛋白,其通过线粒体内膜转运质子)和MLRQ亚基(MLRQ,大复合物I的一个额外亚基,功能未知)。我们基于结构基因组数据提供了一种两步法,以证明这些基因家族应该是在脊椎动物起源时通过全基因组复制(或大片段重复)事件产生的,并且只是在那之后,才经历了物种特异性的进一步基因复制和丢失事件。总之,本综述反映了需要应用源自“经典”分子系统发育分析和“新”基因组图谱分析的基因组比较方法,以成功定义基因家族成员之间复杂的进化关系,而这反过来对于获得任何其他比较系统发育或功能结果都是必不可少的。