Herklotz Veit, Kovařík Aleš, Wissemann Volker, Lunerová Jana, Vozárová Radka, Buschmann Sebastian, Olbricht Klaus, Groth Marco, Ritz Christiane M
Department of Botany, Senckenberg Museum of Natural History Görlitz, Görlitz, Germany.
Department of Molecular Epigenetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, Czechia.
Front Plant Sci. 2021 Dec 7;12:738119. doi: 10.3389/fpls.2021.738119. eCollection 2021.
Plant genomes consist, to a considerable extent, of non-coding repetitive DNA. Several studies showed that phylogenetic signals can be extracted from such repeatome data by using among-species dissimilarities from the RepeatExplorer2 pipeline as distance measures. Here, we advanced this approach by adjusting the read input for comparative clustering indirectly proportional to genome size and by summarizing all clusters into a main distance matrix subjected to Neighbor Joining algorithms and Principal Coordinate Analyses. Thus, our multivariate statistical method works as a "repeatomic fingerprint," and we proved its power and limitations by exemplarily applying it to the family at intrafamilial and, in the genera and , at the intrageneric level. Since both taxa are prone to hybridization events, we wanted to show whether repeatome data are suitable to unravel the origin of natural and synthetic hybrids. In addition, we compared the results based on complete repeatomes with those from ribosomal DNA clusters only, because they represent one of the most widely used barcoding markers. Our results demonstrated that repeatome data contained a clear phylogenetic signal supporting the current subfamilial classification within . Accordingly, the well-accepted major evolutionary lineages within were distinguished, and hybrids showed intermediate positions between parental species in data sets retrieved from both complete repeatomes and rDNA clusters. Within the taxonomically more complicated and particularly frequently hybridizing genus , we detected rather weak phylogenetic signals but surprisingly found a geographic pattern at a population scale. In sum, our method revealed promising results at larger taxonomic scales as well as within taxa with manageable levels of reticulation, but success remained rather taxon specific. Since repeatomes can be technically easy and comparably inexpensively retrieved even from samples of rather poor DNA quality, our phylogenomic method serves as a valuable alternative when high-quality genomes are unavailable, for example, in the case of old museum specimens.
植物基因组在很大程度上由非编码重复DNA组成。多项研究表明,通过使用RepeatExplorer2管道中的种间差异作为距离度量,可以从此类重复基因组数据中提取系统发育信号。在这里,我们改进了这种方法,通过将用于比较聚类的读取输入与基因组大小成反比进行调整,并将所有聚类汇总到一个主距离矩阵中,该矩阵经过邻接法算法和主坐标分析。因此,我们的多变量统计方法起到了“重复基因组指纹”的作用,我们通过在科内的家族水平以及在属和属内的属水平上示例性地应用它,证明了其能力和局限性。由于这两个分类群都容易发生杂交事件,我们想展示重复基因组数据是否适合揭示天然和合成杂种的起源。此外,我们将基于完整重复基因组的结果与仅来自核糖体DNA聚类的结果进行了比较,因为它们代表了使用最广泛的条形码标记之一。我们的结果表明,重复基因组数据包含明确的系统发育信号,支持了当前科内的亚科分类。相应地,科内公认的主要进化谱系得以区分,并且在从完整重复基因组和rDNA聚类中检索到的数据集中,杂种显示出介于亲本物种之间的中间位置。在分类学上更复杂且特别频繁杂交的属内,我们检测到相当微弱的系统发育信号,但令人惊讶地在种群规模上发现了一种地理模式。总之,我们的方法在更大的分类尺度以及具有可控网状化水平的分类群内都显示出了有前景的结果,但成功程度仍然相当因分类群而异。由于即使从DNA质量相当差的样本中也可以在技术上轻松且相对廉价地获取重复基因组,我们的系统发育基因组学方法在无法获得高质量基因组的情况下,例如对于旧博物馆标本,是一种有价值的替代方法。