Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China.
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae047.
Mobilization typing (MOB) is a classification scheme for plasmid genomes based on their relaxase gene. The host ranges of plasmids of different MOB categories are diverse, and MOB is crucial for investigating plasmid mobilization, especially the transmission of resistance genes and virulence factors. However, MOB typing of plasmid metagenomic data is challenging due to the highly fragmented characteristics of metagenomic contigs.
We developed MOBFinder, an 11-class classifier, for categorizing plasmid fragments into 10 MOB types and a nonmobilizable category. We first performed MOB typing to classify complete plasmid genomes according to relaxase information and then constructed an artificial benchmark dataset of plasmid metagenomic fragments (PMFs) from those complete plasmid genomes whose MOB types are well annotated. Next, based on natural language models, we used word vectors to characterize the PMFs. Several random forest classification models were trained and integrated to predict fragments of different lengths. Evaluating the tool using the benchmark dataset, we found that MOBFinder outperforms previous tools such as MOBscan and MOB-suite, with an overall accuracy approximately 59% higher than that of MOB-suite. Moreover, the balanced accuracy, harmonic mean, and F1-score reached up to 99% for some MOB types. When applied to a cohort of patients with type 2 diabetes (T2D), MOBFinder offered insights suggesting that the MOBF type plasmid, which is widely present in Escherichia and Klebsiella, and the MOBQ type plasmid might accelerate antibiotic resistance transmission in patients with T2D.
To the best of our knowledge, MOBFinder is the first tool for MOB typing of PMFs. The tool is freely available at https://github.com/FengTaoSMU/MOBFinder.
移动基因分型(MOB)是一种基于其松弛酶基因对质粒基因组进行分类的方案。不同 MOB 类别的质粒的宿主范围多种多样,MOB 对于研究质粒的移动性至关重要,尤其是耐药基因和毒力因子的传播。然而,由于宏基因组 contigs 的高度碎片化特征,对质粒宏基因组数据进行 MOB 分型具有挑战性。
我们开发了 MOBFinder,这是一种 11 类分类器,可将质粒片段分为 10 种 MOB 类型和一种不可移动的类别。我们首先根据松弛酶信息对完整质粒基因组进行 MOB 分型,然后根据 MOB 类型得到很好注释的完整质粒基因组构建质粒宏基因组片段(PMF)的人工基准数据集。接下来,基于自然语言模型,我们使用单词向量来描述 PMF。然后训练和整合了几个随机森林分类模型来预测不同长度的片段。使用基准数据集评估该工具,我们发现 MOBFinder 的性能优于 MOBscan 和 MOB-suite 等先前的工具,总体准确率比 MOB-suite 高约 59%。此外,某些 MOB 类型的平衡准确率、调和平均值和 F1 得分高达 99%。当应用于 2 型糖尿病(T2D)患者队列时,MOBFinder 提供的见解表明,广泛存在于大肠杆菌和克雷伯氏菌中的 MOBF 型质粒和 MOBQ 型质粒可能会加速 T2D 患者的抗生素耐药性传播。
据我们所知,MOBFinder 是第一个用于 PMF 的 MOB 分型工具。该工具可在 https://github.com/FengTaoSMU/MOBFinder 上免费获得。