Department of Forest Growth, Silviculture and Genetics, Austrian Research Centre for Forests, Vienna, Austria.
Komarov Botanical Institute, Russian Academy of Sciences, St. Petersburg, Russian Federation.
Mol Ecol Resour. 2022 Nov;22(8):3018-3034. doi: 10.1111/1755-0998.13684. Epub 2022 Jul 29.
The analysis of target enrichment data in phylogenetics lacks optimization toward using paralogues for phylogenetic reconstruction. We developed a novel approach of detecting paralogues and utilizing them for phylogenetic tree inference, by retrieving both ortho- and paralogous copies and creating orthologous alignments, from which the gene trees are built. We implemented this approach in ParalogWizard and demonstrate its performance in plant groups that underwent a whole genome duplication relatively recently: the subtribe Malinae (family Rosaceae), using Angiosperms353 as well as Malinae481 probes, the genus Oritrophium (family Asteraceae), using Compositae1061 probes, and the genus Amomum (family Zingiberaceae), using Zingiberaceae1180 probes. Discriminating between orthologues and paralogues reduced gene tree discordance and increased the species tree support in the case of the Malinae, but not for Oritrophium and Amomum. This may relate to the difference in the proportion of paralogous loci between the data sets, which was highest for the Malinae. Overall, retrieving paralogues for phylogenetic reconstruction following ParalogWizard has the potential to increase the species tree support and reduce gene tree discordance in target enrichment data, particularly if the proportion of paralogous loci is high.
系统发育中目标富集数据的分析缺乏针对使用同源基因进行系统发育重建的优化。我们开发了一种新的方法,通过检索同源和旁系同源拷贝并创建同源比对,从这些比对中构建基因树,来检测和利用同源基因进行系统发育树推断。我们在 ParalogWizard 中实现了这种方法,并在最近经历了全基因组加倍的植物类群中进行了演示:使用 Angiosperms353 和 Malinae481 探针的桃金娘亚科(蔷薇科)、使用 Compositae1061 探针的奥里托普姆属(菊科)以及使用 Zingiberaceae1180 探针的阳春砂属(姜科)。在桃金娘亚科的情况下,区分同源基因和同源基因减少了基因树的分歧,并增加了物种树的支持,但在奥里托普姆属和阳春砂属的情况下并非如此。这可能与数据集之间的同源基因座比例的差异有关,在桃金娘亚科中比例最高。总体而言,使用 ParalogWizard 检索同源基因进行系统发育重建有潜力增加目标富集数据中的物种树支持并减少基因树分歧,特别是如果同源基因座的比例较高。