Reynolds Gillian, Mumey Brendan, Strnadova-Neeley Veronika, Lachowiec Jennifer
Plant Sciences and Plant Pathology Department Montana State University Bozeman 59717 Montana USA.
Gianforte School of Computing Montana State University Bozeman 59717 Montana USA.
Appl Plant Sci. 2024 Apr 29;12(4):e11581. doi: 10.1002/aps3.11581. eCollection 2024 Jul-Aug.
The genomes of polyploid plants archive the evolutionary events leading to their present forms. However, plant polyploid genomes present numerous hurdles to the genome comparison algorithms for classification of polyploid types and exploring genome dynamics.
Here, the problem of intra- and inter-genome comparison for examining polyploid genomes is reframed as a metagenomic problem, enabling the use of the rapid and scalable MinHashing approach. To determine how types of polyploidy are described by this metagenomic approach, plant genomes were examined from across the polyploid spectrum for both -mer composition and frequency with a range of -mer sizes. In this approach, no subgenome-specific -mers are identified; rather, whole-chromosome -mer subspaces were utilized.
Given chromosome-scale genome assemblies with sufficient subgenome-specific repetitive element content, literature-verified subgenomic and genomic evolutionary relationships were revealed, including distinguishing auto- from allopolyploidy and putative progenitor genome assignment. The sequences responsible were the rapidly evolving landscape of transposable elements. An investigation into the MinHashing parameters revealed that the downsampled -mer space (genomic signatures) produced excellent approximations of sequence similarity. Furthermore, the clustering approach used for comparison of the genomic signatures is scrutinized to ensure applicability of the metagenomics-based method.
The easily implementable and highly computationally efficient MinHashing-based sequence comparison strategy enables comparative subgenomics and genomics for large and complex polyploid plant genomes. Such comparisons provide evidence for polyploidy-type subgenomic assignments. In cases where subgenome-specific repeat signal may not be adequate given a chromosomes' global -mer profile, alternative methods that are more specific but more computationally complex outperform this approach.
多倍体植物的基因组记录了导致其当前形态的进化事件。然而,植物多倍体基因组给用于多倍体类型分类和探索基因组动态的基因组比较算法带来了诸多障碍。
在此,将用于检查多倍体基因组的基因组内和基因组间比较问题重新构建为宏基因组问题,从而能够使用快速且可扩展的MinHashing方法。为了确定这种宏基因组方法如何描述多倍体类型,对多倍体范围内的植物基因组进行了检查,分析了不同大小的k-mer组成和频率。在这种方法中,未识别出特定亚基因组的k-mer;而是利用了全染色体k-mer子空间。
给定具有足够亚基因组特异性重复元件含量的染色体规模基因组组装,揭示了文献验证的亚基因组和基因组进化关系,包括区分同源多倍体和异源多倍体以及推定祖先基因组的归属。起作用的序列是转座元件快速演变的格局。对MinHashing参数的研究表明,下采样的k-mer空间(基因组特征)产生了序列相似性的出色近似值。此外,对用于比较基因组特征的聚类方法进行了审查,以确保基于宏基因组学的方法的适用性。
基于MinHashing的易于实施且计算效率高的序列比较策略,使得能够对大型复杂的多倍体植物基因组进行比较亚基因组学和基因组学研究。此类比较为多倍体类型的亚基因组归属提供了证据。在给定染色体的全局k-mer谱但亚基因组特异性重复信号可能不足的情况下,更具特异性但计算更复杂的替代方法优于此方法。