IRHS, Agrocampus-Ouest, INRA, UNIV Angers, SFR 4207 QuaSaV, Beaucouzé, France.
ILVO, Flanders Research Institute for Agriculture, Fisheries and Food, Plant Sciences Unit, Melle, Belgium.
BMC Evol Biol. 2019 Jul 24;19(1):152. doi: 10.1186/s12862-019-1479-z.
With an ever-growing number of published genomes, many low levels of the Tree of Life now contain several species with enough molecular data to perform shallow-scale phylogenomic studies. Moving away from using just a few universal phylogenetic markers, we can now target thousands of other loci to decipher taxa relationships. Making the best possible selection of informative sequences regarding the taxa studied has emerged as a new issue. Here, we developed a general procedure to mine genomic data, looking for orthologous single-copy loci capable of deciphering phylogenetic relationships below the generic rank. To develop our strategy, we chose the genus Rosa, a rapid-evolving lineage of the Rosaceae family in which several species genomes have recently been sequenced. We also compared our loci to conventional plastid markers, commonly used for phylogenetic inference in this genus.
We generated 1856 sequence tags in putative single-copy orthologous nuclear loci. Associated in silico primer pairs can potentially amplify fragments able to resolve a wide range of speciation events within the genus Rosa. Analysis of parsimony-informative site content showed the value of non-coding genomic regions to obtain variable sequences despite the fact that they may be more difficult to target in less related species. Dozens of nuclear loci outperform the conventional plastid phylogenetic markers in terms of phylogenetic informativeness, for both recent and ancient evolutionary divergences. However, conflicting phylogenetic signals were found between nuclear gene tree topologies and the species-tree topology, shedding light on the many patterns of hybridization and/or incomplete lineage sorting that occur in the genus Rosa.
With recently published genome sequence data, we developed a set of single-copy orthologous nuclear loci to resolve species-level phylogenomics in the genus Rosa. This genome-wide scale dataset contains hundreds of highly variable loci which phylogenetic interest was assessed in terms of phylogenetic informativeness and topological conflict. Our target identification procedure can easily be reproduced to identify new highly informative loci for other taxonomic groups and ranks.
随着越来越多已发表基因组的出现,许多生命之树的低水平现在包含了几个具有足够分子数据来进行浅层系统发育基因组研究的物种。我们不再仅仅使用少数通用的系统发育标记物,而是现在可以针对数千个其他基因座来破译分类群的关系。如何针对所研究的分类群选择最好的信息序列成为了一个新的问题。在这里,我们开发了一种一般的程序来挖掘基因组数据,寻找能够破译属级以下分类关系的直系同源单拷贝基因座。为了制定我们的策略,我们选择了蔷薇属,这是蔷薇科中一个快速进化的谱系,最近已经对几个物种的基因组进行了测序。我们还将我们的基因座与常用于该属系统发育推断的常规质体标记进行了比较。
我们在假定的单拷贝直系同源核基因座中生成了 1856 个序列标签。潜在的核基因座在计算机上设计的引物对可以潜在地扩增能够解决蔷薇属内广泛的物种形成事件的片段。简约信息位点含量的分析表明,尽管非编码基因组区域可能更难以在亲缘关系较远的物种中靶向,但它们对于获得可变序列具有重要价值。数十个核基因座在近期和古老进化分歧方面在系统发育信息量方面优于常规质体系统发育标记。然而,在核基因树拓扑结构和种系树拓扑结构之间发现了冲突的系统发育信号,这揭示了蔷薇属中发生的许多杂交和/或不完全谱系分选模式。
利用最近发表的基因组序列数据,我们开发了一套单拷贝直系同源核基因座,以解决蔷薇属中的种级系统发育基因组学问题。这个全基因组规模的数据集包含数百个高度可变的基因座,我们根据系统发育信息量和拓扑冲突来评估它们的系统发育意义。我们的目标识别程序可以很容易地复制,以识别其他分类群和等级的新的高度信息基因座。