Suppr超能文献

应用重排距离使 pling 能够进行质粒流行病学研究。

Applying rearrangement distances to enable plasmid epidemiology with pling.

机构信息

European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK.

Centre for Immunology and Infection Control, Queensland University of Technology, Brisbane, Queensland, Australia.

出版信息

Microb Genom. 2024 Oct;10(10). doi: 10.1099/mgen.0.001300.

Abstract

Plasmids are a key vector of antibiotic resistance, but the current bioinformatics toolkit is not well suited to tracking them. The rapid structural changes seen in plasmid genomes present considerable challenges to evolutionary and epidemiological analysis. Typical approaches are either low resolution (replicon typing) or use shared k-mer content to define a genetic distance. However, this distance can both overestimate plasmid relatedness by ignoring rearrangements, and underestimate by over-penalizing gene gain/loss. Therefore a model is needed which captures the key components of how plasmid genomes evolve structurally - through gene/block gain or loss, and rearrangement. A secondary requirement is to prevent promiscuous transposable elements (TEs) leading to over-clustering of unrelated plasmids. We choose the 'Double Cut and Join Indel' (DCJ-Indel) model, in which plasmids are studied at a coarse level, as a sequence of signed integers (representing genes or aligned blocks), and the distance between two plasmids is the minimum number of rearrangement events or indels needed to transform one into the other. We show how this gives much more meaningful distances between plasmids. We introduce a software workflow pling (https://github.com/iqbal-lab-org/pling), which uses the DCJ-Indel model, to calculate distances between plasmids and then cluster them. In our approach, we combine containment distances and DCJ-Indel distances to build a TE-aware plasmid network. We demonstrate superior performance and interpretability to other plasmid clustering tools on the 'Russian Doll' dataset and a hospital transmission dataset.

摘要

质粒是抗生素耐药性的重要载体,但目前的生物信息学工具包并不适合追踪它们。质粒基因组的快速结构变化给进化和流行病学分析带来了相当大的挑战。典型的方法要么分辨率低(复制子分型),要么使用共享的 k-mer 内容来定义遗传距离。然而,这种距离既可以通过忽略重排来高估质粒的亲缘关系,也可以通过过度惩罚基因的获得/缺失来低估。因此,需要一种模型来捕捉质粒基因组结构进化的关键组件——通过基因/块的获得或缺失以及重排。第二个要求是防止混杂的可移动元件 (TEs) 导致不相关的质粒过度聚类。我们选择“双切接缺失”(DCJ-Indel)模型,在该模型中,质粒以粗粒度水平进行研究,作为有符号整数的序列(代表基因或对齐块),两个质粒之间的距离是将一个质粒转换为另一个质粒所需的最小重排事件或缺失数。我们展示了这如何在质粒之间产生更有意义的距离。我们引入了一个名为 pling(https://github.com/iqbal-lab-org/pling)的软件工作流程,该流程使用 DCJ-Indel 模型计算质粒之间的距离,然后对其进行聚类。在我们的方法中,我们结合包含距离和 DCJ-Indel 距离来构建一个 TE 感知的质粒网络。我们在“俄罗斯娃娃”数据集和医院传播数据集上展示了优于其他质粒聚类工具的性能和可解释性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验