Department of Genetics, Bioinformatics, and Computational Biology, Virginia Techgrid.438526.e, Blacksburg, Virginia, USA.
Department of Civil and Environmental Engineering, Virginia Techgrid.438526.e, Blacksburg, Virginia, USA.
Appl Environ Microbiol. 2022 Sep 22;88(18):e0099122. doi: 10.1128/aem.00991-22. Epub 2022 Aug 29.
Bacterial mobile genetic elements (MGEs) encode functional modules that perform both core and accessory functions for the element, the latter of which are often only transiently associated with the element. The presence of these accessory genes, which are often close homologs to primarily immobile genes, incur high rates of false positives and, therefore, limits the usability of these databases for MGE annotation. To overcome this limitation, we analyzed 10,776,849 protein sequences derived from eight MGE databases to compile a comprehensive set of 6,140 manually curated protein families that are linked to the "life cycle" (integration/excision, replication/recombination/repair, transfer, stability/transfer/defense, and phage-specific processes) of plasmids, phages, integrative, transposable, and conjugative elements. We overlay experimental information where available to create a tiered annotation scheme of high-quality annotations and annotations inferred exclusively through bioinformatic evidence. We additionally provide an MGE-class label for each entry (e.g., plasmid or integrative element), and assign to each entry a major and minor category. The resulting database, mobileOG-db (for mobile orthologous groups), comprises over 700,000 deduplicated sequences encompassing five major mobileOG categories and more than 50 minor categories, providing a structured language and interpretable basis for an array of MGE-centered analyses. mobileOG-db can be accessed at mobileogdb.flsi.cloud.vt.edu/, where users can select, refine, and analyze custom subsets of the dynamic mobilome. The analysis of bacterial mobile genetic elements (MGEs) in genomic data is a critical step toward profiling the root causes of antibiotic resistance, phenotypic or metabolic diversity, and the evolution of bacterial genera. Existing methods for MGE annotation pose high barriers of biological and computational expertise to properly harness. To bridge this gap, we systematically analyzed 10,776,849 proteins derived from eight databases of MGEs to identify 6,140 MGE protein families that can serve as candidate hallmarks, i.e., proteins that can be used as "signatures" of MGEs to aid annotation. The resulting resource, mobileOG-db, provides a multilevel classification scheme that encompasses plasmid, phage, integrative, and transposable element protein families categorized into five major mobileOG categories and more than 50 minor categories. mobileOG-db thus provides a rich resource for simple and intuitive element annotation that can be integrated seamlessly into existing MGE detection pipelines and colocalization analyses.
细菌移动遗传元件 (MGE) 编码功能模块,这些模块执行元件的核心和辅助功能,后者通常仅与元件短暂相关。这些辅助基因的存在,它们通常与主要不可移动的基因密切同源,会导致高假阳性率,因此限制了这些数据库在 MGE 注释中的可用性。为了克服这一限制,我们分析了来自八个 MGE 数据库的 10,776,849 个蛋白质序列,以编制一套全面的 6,140 个人工 curated 蛋白质家族,这些家族与质粒、噬菌体、整合、转座和可移动元件的“生命周期”(整合/切除、复制/重组/修复、转移、稳定性/转移/防御和噬菌体特异性过程)相关联。我们在可用时覆盖实验信息,以创建一个具有高质量注释和仅通过生物信息学证据推断的注释的分层注释方案。我们还为每个条目提供一个 MGE 类别标签,并为每个条目分配一个主要类别和次要类别。由此产生的数据库 mobileOG-db(用于移动同源群)包含超过 700,000 个去重序列,涵盖五个主要的移动 OG 类别和 50 多个次要类别,为一系列以 MGE 为中心的分析提供了一种结构化的语言和可解释的基础。mobileOG-db 可在 mobileogdb.flsi.cloud.vt.edu/ 访问,用户可以在该网站上选择、细化和分析动态移动组的自定义子集。在基因组数据中分析细菌移动遗传元件 (MGE) 是剖析抗生素耐药性、表型或代谢多样性以及细菌属进化的根本原因的关键步骤。现有的 MGE 注释方法对正确利用生物和计算专业知识提出了很高的要求。为了弥合这一差距,我们系统地分析了来自八个 MGE 数据库的 10,776,849 个蛋白质,以确定 6,140 个 MGE 蛋白质家族,这些家族可以作为候选标志,即可以用作 MGE 辅助注释的“特征”的蛋白质。由此产生的资源 mobileOG-db 提供了一个多层次分类方案,其中包括质粒、噬菌体、整合和转座元件蛋白质家族,分为五个主要的移动 OG 类别和 50 多个次要类别。因此,mobileOG-db 为简单直观的元素注释提供了丰富的资源,可无缝集成到现有的 MGE 检测管道和共定位分析中。