Suppr超能文献

skDER和CiDDER:两种用于微生物基因组去重复的可扩展方法。

skDER and CiDDER: two scalable approaches for microbial genome dereplication.

作者信息

Salamzade Rauf, Kottapalli Aamuktha, Kalan Lindsay R

机构信息

Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, USA.

Microbiology Doctoral Training Program, University of Wisconsin-Madison, Madison, WI, USA.

出版信息

Microb Genom. 2025 Jul;11(7). doi: 10.1099/mgen.0.001438.

Abstract

An abundance of microbial genomes have been sequenced in the past two decades. For fundamental comparative genomic investigations, where the goal is to determine the major gain and loss events shaping the pangenome of a species or broader taxon, it is often unnecessary and computationally onerous to include all available genomes in studies. In addition, the over-representation of specific lineages due to sampling and sequencing bias can have undesired effects on evolutionary analyses. To assist users with , we developed skDER and CiDDER (https://github.com/raufs/skDER) to select a subset of representative genomes for downstream comparative genomic investigations. skDER is a nucleotide-based genomic dereplication tool that can dereplicate thousands of microbial genomes leveraging recent advances in average nucleotide identity (ANI) inference. CiDDER dereplicates microbial genomes based on saturation assessment of distinct protein-coding genes. To support usability, auxiliary functionalities are incorporated for testing the number of representative genomes resulting from applying various clustering parameters, automated downloading of genomes belonging to a bacterial species or genus, clustering non-representative genomes to their closest representative genomes and filtering plasmids and phages prior to dereplication. From benchmarking against other ANI-based dereplication tools, skDER, when run in the default mode, was efficient and achieved comparable pangenome coverage and strictly adhered to user-defined cutoffs for both ANI and aligned fraction (AF). Further, we showcase that CiDDER is a convenient alternative to ANI-based dereplication that allows users to more directly optimize the selection of representative genomes to cover a large breadth of a taxon's pangenome.

摘要

在过去二十年中,大量微生物基因组已被测序。对于基础的比较基因组研究,其目标是确定塑造一个物种或更广泛分类单元的泛基因组的主要得失事件,在研究中纳入所有可用基因组往往既无必要,计算量也很大。此外,由于采样和测序偏差导致的特定谱系的过度代表性可能会对进化分析产生不良影响。为了帮助用户,我们开发了skDER和CiDDER(https://github.com/raufs/skDER),以选择代表性基因组的子集用于下游的比较基因组研究。skDER是一种基于核苷酸的基因组去重复工具,它可以利用平均核苷酸同一性(ANI)推断方面的最新进展,对数千个微生物基因组进行去重复。CiDDER基于对不同蛋白质编码基因的饱和度评估对微生物基因组进行去重复。为了支持可用性,还纳入了辅助功能,用于测试应用各种聚类参数后得到的代表性基因组数量、自动下载属于某一细菌物种或属的基因组、将非代表性基因组聚类到其最接近代表基因组,并在去重复之前过滤质粒和噬菌体。通过与其他基于ANI的去重复工具进行基准测试,skDER在默认模式下运行时效率很高,实现了相当的泛基因组覆盖,并且严格遵守了用户定义的ANI和比对分数(AF)的截止值。此外,我们展示了CiDDER是基于ANI的去重复的一种便捷替代方法,它允许用户更直接地优化代表性基因组的选择,以覆盖分类单元泛基因组的很大范围。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7614/12245536/f8bcd17aca2f/mgen-11-01438-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验