Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, United States.
Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, United States.
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii155-ii164. doi: 10.1093/bioinformatics/btae398.
Motivation: Mobile genetic elements (MGEs) are as ubiquitous in nature as they are varied in type, ranging from viral insertions to transposons to incorporated plasmids. Horizontal transfer of MGEs across bacterial species may also pose a significant threat to global health due to their capability to harbor antibiotic resistance genes. However, despite cheap and rapid whole-genome sequencing, the varied nature of MGEs makes it difficult to fully characterize them, and existing methods for detecting MGEs often do not agree on what should count. In this manuscript, we first define and argue in favor of a divergence-based characterization of mobile-genetic elements. Results: Using that paradigm, we present skandiver, a tool designed to efficiently detect MGEs from whole-genome assemblies without the need for gene annotation or markers. skandiver determines mobile elements via genome fragmentation, average nucleotide identity (ANI), and divergence time. By building on the scalable skani software for ANI computation, skandiver can query hundreds of complete assemblies against >65 000 representative genomes in a few minutes and 19 GB memory, providing scalable and efficient method for elucidating mobile element profiles in incomplete, uncharacterized genomic sequences. For isolated and integrated large plasmids (>10 kb), skandiver's recall was 48% and 47%, MobileElementFinder was 59% and 17%, and geNomad was 86% and 32%, respectively. For isolated large plasmids, skandiver's recall (48%) is lower than state-of-the-art reference-based methods geNomad (86%) and MobileElementFinder (59%). However, skandiver achieves higher recall on integrated plasmids and, unlike other methods, without comparing against a curated database, making skandiver suitable for discovery of novel MGEs.
动机:移动遗传元件 (MGE) 在自然界中无处不在,其类型也多种多样,包括病毒插入物、转座子和整合质粒。由于 MGE 能够携带抗生素抗性基因,因此它们在细菌物种之间的水平转移也可能对全球健康构成重大威胁。然而,尽管全基因组测序既廉价又快速,但 MGE 的多样性使得全面描述它们变得困难,并且现有的 MGE 检测方法通常无法就应包含哪些内容达成一致。在本文中,我们首先定义并支持基于分歧的移动遗传元件特征描述。结果:使用该范例,我们提出了 skandiver,这是一种无需基因注释或标记即可从全基因组组装中高效检测 MGE 的工具。skandiver 通过基因组碎片化、平均核苷酸同一性 (ANI) 和分歧时间来确定移动元件。通过构建可扩展的用于 ANI 计算的 skani 软件,skandiver 可以在几分钟内和 19GB 内存中查询数百个完整的组装体,针对 >65,000 个代表基因组,提供了一种可扩展且高效的方法,用于阐明不完整、未表征的基因组序列中的移动元件图谱。对于孤立和整合的大型质粒 (>10kb),skandiver 的召回率分别为 48%和 47%,MobileElementFinder 为 59%和 17%,geNomad 为 86%和 32%。对于孤立的大型质粒,skandiver 的召回率(48%)低于最先进的基于参考的方法 geNomad(86%)和 MobileElementFinder(59%)。然而,skandiver 在整合质粒上的召回率更高,并且与其他方法不同,它无需与经过整理的数据库进行比较,这使得 skandiver 适合于发现新的 MGE。