Université Paris-Saclay, INRAE, PROSE, F-92761, Antony, France.
Université Paris-Saclay, INRAE, MaIAGE, F-78350, Jouy-en-Josas, France.
BMC Genomics. 2021 Mar 16;22(1):186. doi: 10.1186/s12864-021-07471-y.
K-mer-based methods have greatly advanced in recent years, largely driven by the realization of their biological significance and by the advent of next-generation sequencing. Their speed and their independence from the annotation process are major advantages. Their utility in the study of the mobilome has recently emerged and they seem a priori adapted to the patchy gene distribution and the lack of universal marker genes of viruses and plasmids. To provide a framework for the interpretation of results from k-mer based methods applied to archaea or their mobilome, we analyzed the 5-mer DNA profiles of close to 600 archaeal cells, viruses and plasmids. Archaea is one of the three domains of life. Archaea seem enriched in extremophiles and are associated with a high diversity of viral and plasmid families, many of which are specific to this domain. We explored the dataset structure by multivariate and statistical analyses, seeking to identify the underlying factors.
For cells, the 5-mer profiles were inconsistent with the phylogeny of archaea. At a finer taxonomic level, the influence of the taxonomy and the environmental constraints on 5-mer profiles was very strong. These two factors were interdependent to a significant extent, and the respective weights of their contributions varied according to the clade. A convergent adaptation was observed for the class Halobacteria, for which a strong 5-mer signature was identified. For mobile elements, coevolution with the host had a clear influence on their 5-mer profile. This enabled us to identify one previously known and one new case of recent host transfer based on the atypical composition of the mobile elements involved. Beyond the effect of coevolution, extrachromosomal elements strikingly retain the specific imprint of their own viral or plasmid taxonomic family in their 5-mer profile.
This specific imprint confirms that the evolution of extrachromosomal elements is driven by multiple parameters and is not restricted to host adaptation. In addition, we detected only recent host transfer events, suggesting the fast evolution of short k-mer profiles. This calls for caution when using k-mers for host prediction, metagenomic binning or phylogenetic reconstruction.
近年来,基于 K -mer 的方法取得了重大进展,这主要得益于它们的生物学意义的实现以及下一代测序技术的出现。它们的速度和对注释过程的独立性是主要优势。它们在移动组学研究中的应用最近出现了,并且它们似乎先天适应于病毒和质粒基因分布不均和缺乏通用标记基因的情况。为了为基于 K-mer 的方法应用于古菌或其移动组学提供解释结果的框架,我们分析了近 600 个古菌细胞、病毒和质粒的 5-mer DNA 图谱。古菌是生命的三个领域之一。古菌似乎富含极端微生物,并且与许多特定于该领域的病毒和质粒家族高度相关。我们通过多元和统计分析探索了数据集结构,试图确定潜在因素。
对于细胞,5-mer 图谱与古菌的系统发育不一致。在更精细的分类学水平上,分类学和环境约束对 5-mer 图谱的影响非常强烈。这两个因素在很大程度上是相互依存的,它们的贡献权重根据进化枝而变化。对于 Halobacteria 类,观察到了趋同适应,并且确定了一个强烈的 5-mer 特征。对于移动元件,与宿主的共进化对其 5-mer 图谱有明显的影响。这使我们能够根据涉及的移动元件的非典型组成识别一个以前已知的和一个新的最近宿主转移的情况。除了共进化的影响之外,染色体外元件在其 5-mer 图谱中惊人地保留了其自身病毒或质粒分类家族的特定印记。
这种特定的印记证实了染色体外元件的进化受到多个参数的驱动,并且不限于宿主适应。此外,我们只检测到最近的宿主转移事件,这表明短 K-mer 图谱的快速进化。这在使用 K-mers 进行宿主预测、宏基因组分箱或系统发育重建时需要谨慎。