Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.
Mol Biol Evol. 2011 Aug;28(8):2197-210. doi: 10.1093/molbev/msr047. Epub 2011 Apr 4.
Resolving the phylogenetic relationships among birds is a classical problem in systematics, and this is particularly so when it comes to understanding the relationships among Neoaves. Previous phylogenetic inference of birds has been limited to mitochondrial genomes or a few nuclear genes. Here, we apply deep brain transcriptome sequencing of nine bird species (several passerines, hummingbirds, dove, parrot, and emu), using next-generation sequencing technology to understand features of transcriptome evolution in birds and how this affects phylogenetic inference, and combine with data from two bird species using first generation technology. The phylogenomic data matrix comprises 1,995 genes and a total of 0.77 Mb of exonic sequence. First, we find an unexpected heterogeneity in the evolution of base composition among avian lineages. There is a pronounced increase in guanine + cytosine (GC) content in the third codon position in several independent lineages, with the strongest effect seen in passerines. Second, we evaluate the effect of GC content variation on phylogenetic reconstruction. We find important inconsistencies between the topologies obtained with or without taking GC variation into account, each supporting different conclusions of past studies and also influencing hypotheses on the evolution of the trait of vocal learning. Third, we demonstrate a link between GC content evolution and recombination rate and, focusing on the zebra finch lineage, find that recombination seems to drive GC content. Although we cannot reveal the causal relationships, this observation is consistent with the model of GC-biased gene conversion. Finally, we use this unparalleled amount of avian sequence data to study the rate of molecular evolution, calibrated by fossil evidence and augmented with data from alligator transcriptome sequencing. There is a 2- to 3-fold variation in substitution rate among lineages with passerines being the most rapidly evolving and ratites the slowest. This study illustrates the potential of next-generation sequencing for phylogenomic studies but also the pitfalls when using genome-wide data with heterogeneous base composition.
解决鸟类的系统发育关系是系统学中的一个经典问题,尤其是在理解新鸟类的关系时更是如此。以前对鸟类的系统发育推断仅限于线粒体基因组或少数核基因。在这里,我们应用下一代测序技术对 9 种鸟类(几种雀形目鸟类、蜂鸟、鸽子、鹦鹉和鸸鹋)的深部脑转录组进行测序,以了解鸟类转录组进化的特征以及这如何影响系统发育推断,并结合使用第一代技术的两种鸟类的数据。基因组数据矩阵包括 1995 个基因和总共 0.77Mb 的外显子序列。首先,我们发现鸟类谱系中碱基组成进化存在意想不到的异质性。在几个独立的谱系中,第三密码子位置的鸟嘌呤+胞嘧啶(GC)含量明显增加,在雀形目鸟类中影响最大。其次,我们评估了 GC 含量变化对系统发育重建的影响。我们发现,在考虑或不考虑 GC 变异的情况下,拓扑结构存在重要的不一致,每种方法都支持过去研究的不同结论,也影响了对发声学习特征进化的假设。第三,我们证明了 GC 含量进化与重组率之间存在联系,并重点研究了斑马雀谱系,发现重组似乎驱动了 GC 含量。尽管我们无法揭示因果关系,但这一观察结果与 GC 偏向性基因转换的模型一致。最后,我们利用这种无与伦比的鸟类序列数据来研究分子进化率,该数据通过化石证据进行校准,并结合了来自鳄鱼转录组测序的数据。谱系之间的替代率变化幅度为 2 到 3 倍,其中雀形目鸟类的进化速度最快,平胸目鸟类最慢。这项研究说明了下一代测序在系统发育研究中的潜力,但也说明了使用具有异质碱基组成的全基因组数据时存在的陷阱。