Yang Ya, Moore Michael J, Brockington Samuel F, Soltis Douglas E, Wong Gane Ka-Shu, Carpenter Eric J, Zhang Yong, Chen Li, Yan Zhixiang, Xie Yinlong, Sage Rowan F, Covshoff Sarah, Hibberd Julian M, Nelson Matthew N, Smith Stephen A
Department of Ecology & Evolutionary Biology, University of Michigan
Department of Biology, Oberlin College, Science Center K111, Oberlin, OH.
Mol Biol Evol. 2015 Aug;32(8):2001-14. doi: 10.1093/molbev/msv081. Epub 2015 Apr 2.
Many phylogenomic studies based on transcriptomes have been limited to "single-copy" genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122-gene data set with a gene occupancy of 92.1%. From the homolog trees, we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole-genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups.
由于在同源性和直系同源性推断方面存在方法上的挑战,许多基于转录组的系统发育基因组学研究仅限于“单拷贝”基因。只有相对较少的研究探索了除重建物种关系之外的分析。我们对极度多样化的石竹目植物分支中的69个转录组以及来自真双子叶植物注释基因组的27个外类群进行了采样。使用基于相似性和系统发育树的组合方法,我们获得了10960个同源组,每个同源组至少由8个内类群分类单元代表。通过分解这些同源树,并考虑基因复制,我们获得了17273个直系同源组,每个直系同源组至少由10个内类群分类单元代表。我们使用一个包含1122个基因的数据集重建了物种系统发育,基因占有率为92.1%。从同源树中,我们发现草本谱系中的同义替换率和非同义替换率高达其木本近缘种的三倍。这是首次在有密集分类群采样的数千个核基因中显示出这种模式。我们还确定了石竹目树中以相对较高的基因复制频率为特征的区域,包括三个以前未被识别的全基因组复制。通过进一步结合同源树拓扑结构和旁系同源对之间的同义距离信息,确定了13个假定的基因组复制事件的系统发育位置。经历最大基因家族扩张的基因集中在参与信号转导和氧化还原的基因中,包括一个编码甜菜碱合成途径中关键酶的细胞色素P450基因。我们的方法展示了一种基于同源组以及推断的直系同源组对非模式物种进行功能系统发育基因组分析的新方法。