CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, PR China.
Biol Direct. 2011 Feb 22;6:13. doi: 10.1186/1745-6150-6-13.
Mammalian genome sequence data are being acquired in large quantities and at enormous speeds. We now have a tremendous opportunity to better understand which genes are the most variable or conserved, and what their particular functions and evolutionary dynamics are, through comparative genomics.
We chose human and eleven other high-coverage mammalian genome data-as well as an avian genome as an outgroup-to analyze orthologous protein-coding genes using nonsynonymous (Ka) and synonymous (Ks) substitution rates. After evaluating eight commonly-used methods of Ka and Ks calculation, we observed that these methods yielded a nearly uniform result when estimating Ka, but not Ks (or Ka/Ks). When sorting genes based on Ka, we noticed that fast-evolving and slow-evolving genes often belonged to different functional classes, with respect to species-specificity and lineage-specificity. In particular, we identified two functional classes of genes in the acquired immune system. Fast-evolving genes coded for signal-transducing proteins, such as receptors, ligands, cytokines, and CDs (cluster of differentiation, mostly surface proteins), whereas the slow-evolving genes were for function-modulating proteins, such as kinases and adaptor proteins. In addition, among slow-evolving genes that had functions related to the central nervous system, neurodegenerative disease-related pathways were enriched significantly in most mammalian species. We also confirmed that gene expression was negatively correlated with evolution rate, i.e. slow-evolving genes were expressed at higher levels than fast-evolving genes. Our results indicated that the functional specializations of the three major mammalian clades were: sensory perception and oncogenesis in primates, reproduction and hormone regulation in large mammals, and immunity and angiotensin in rodents.
Our study suggests that Ka calculation, which is less biased compared to Ks and Ka/Ks, can be used as a parameter to sort genes by evolution rate and can also provide a way to categorize common protein functions and define their interaction networks, either pair-wise or in defined lineages or subgroups. Evaluating gene evolution based on Ka and Ks calculations can be done with large datasets, such as mammalian genomes.
This article has been reviewed by Drs. Anamaria Necsulea (nominated by Nicolas Galtier), Subhajyoti De (nominated by Sarah Teichmann) and Claus O. Wilke.
哺乳动物基因组序列数据正以大量且极快的速度被获取。通过比较基因组学,我们现在有机会更好地理解哪些基因是最可变或最保守的,以及它们的特定功能和进化动态是什么。
我们选择了人类和其他十一个高覆盖率的哺乳动物基因组数据,以及一个鸟类基因组作为外群,使用非同义(Ka)和同义(Ks)取代率来分析直系同源蛋白编码基因。在评估了八种常用的 Ka 和 Ks 计算方法后,我们观察到这些方法在估计 Ka 时几乎产生了一致的结果,但在估计 Ks(或 Ka/Ks)时并非如此。当根据 Ka 对基因进行排序时,我们注意到快速进化和缓慢进化的基因通常属于不同的功能类别,涉及物种特异性和谱系特异性。特别是,我们在获得性免疫系统中鉴定了两个功能类别基因。快速进化的基因编码信号转导蛋白,如受体、配体、细胞因子和 CD(分化群,主要是表面蛋白),而缓慢进化的基因则编码功能调节蛋白,如激酶和衔接蛋白。此外,在与中枢神经系统功能相关的缓慢进化基因中,大多数哺乳动物物种中神经退行性疾病相关途径显著富集。我们还证实,基因表达与进化率呈负相关,即缓慢进化的基因表达水平高于快速进化的基因。我们的研究结果表明,三个主要哺乳动物分支的功能专业化为:灵长类动物的感觉感知和肿瘤发生、大型哺乳动物的繁殖和激素调节、以及啮齿动物的免疫和血管紧张素。
我们的研究表明,与 Ks 和 Ka/Ks 相比,Ka 计算的偏差较小,可以用作根据进化率对基因进行排序的参数,也可以提供一种对常见蛋白质功能进行分类并定义其相互作用网络的方法,无论是两两之间还是在特定谱系或亚群中。基于 Ka 和 Ks 计算评估基因进化可以用大型数据集,如哺乳动物基因组来完成。
本文已由 Dr. Anamaria Necsulea(Nicolas Galtier 提名)、Subhajyoti De(Sarah Teichmann 提名)和 Claus O. Wilke 博士审阅。