长读长和短读短宏基因组组装方法比较用于低丰度物种和抗性基因。

Comparison of long- and short-read metagenomic assembly for low-abundance species and resistance genes.

机构信息

Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Applied Invention, LLC, Cambridge, MA, USA.

出版信息

Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad050.

DOI:10.1093/bib/bbad050

PMID:36804804

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10025444/

Abstract

Recent technological and computational advances have made metagenomic assembly a viable approach to achieving high-resolution views of complex microbial communities. In previous benchmarking, short-read (SR) metagenomic assemblers had the highest accuracy, long-read (LR) assemblers generated the most contiguous sequences and hybrid (HY) assemblers balanced length and accuracy. However, no assessments have specifically compared the performance of these assemblers on low-abundance species, which include clinically relevant organisms in the gut. We generated semi-synthetic LR and SR datasets by spiking small and increasing amounts of Escherichia coli isolate reads into fecal metagenomes and, using different assemblers, examined E. coli contigs and the presence of antibiotic resistance genes (ARGs). For ARG assembly, although SR assemblers recovered more ARGs with high accuracy, even at low coverages, LR assemblies allowed for the placement of ARGs within longer, E. coli-specific contigs, thus pinpointing their taxonomic origin. HY assemblies identified resistance genes with high accuracy and had lower contiguity than LR assemblies. Each assembler type's strengths were maintained even when our isolate was spiked in with a competing strain, which fragmented and reduced the accuracy of all assemblies. For strain characterization and determining gene context, LR assembly is optimal, while for base-accurate gene identification, SR assemblers outperform other options. HY assembly offers contiguity and base accuracy, but requires generating data on multiple platforms, and may suffer high misassembly rates when strain diversity exists. Our results highlight the trade-offs associated with each approach for recovering low-abundance taxa, and that the optimal approach is goal-dependent.

摘要

最近的技术和计算进展使得宏基因组组装成为实现复杂微生物群落高分辨率视图的可行方法。在以前的基准测试中，短读（SR）宏基因组组装器具有最高的准确性，长读（LR）组装器生成的序列最连续，混合（HY）组装器平衡了长度和准确性。然而，还没有评估专门比较这些组装器在低丰度物种上的性能，这些物种包括肠道中具有临床相关性的生物体。我们通过将少量和增加量的大肠杆菌分离株reads 掺入粪便宏基因组中，生成了半合成的 LR 和 SR 数据集，并使用不同的组装器检查了大肠杆菌 contigs 和抗生素抗性基因（ARGs）的存在。对于 ARG 组装，尽管 SR 组装器以高精度恢复了更多的 ARGs，即使在低覆盖率下，LR 组装也允许将 ARGs 放置在更长的、大肠杆菌特异性 contigs 中，从而确定它们的分类学起源。HY 组装器以高精度识别抗性基因，并且与 LR 组装器相比，连续性较低。即使我们的分离株与竞争菌株混合，每个组装器类型的优势仍然得以保持，这会使所有组装器的准确性降低。对于菌株特征描述和确定基因上下文，LR 组装是最佳选择，而对于碱基准确的基因识别，SR 组装器优于其他选择。HY 组装提供了连续性和碱基准确性，但需要在多个平台上生成数据，并且当存在菌株多样性时，可能会遭受高错误组装率的影响。我们的结果强调了每种方法在恢复低丰度分类群方面的权衡，并且最佳方法取决于目标。

相似文献

Comparison of long- and short-read metagenomic assembly for low-abundance species and resistance genes.长读长和短读短宏基因组组装方法比较用于低丰度物种和抗性基因。

Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad050.

Intestinal microbiota domination under extreme selective pressures characterized by metagenomic read cloud sequencing and assembly.肠道微生物群落在具有宏基因组读段云测序和组装特征的极端选择压力下占主导地位。

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):585. doi: 10.1186/s12859-019-3073-1.

Metagenomic assemblies tend to break around antibiotic resistance genes.元基因组组装往往在抗生素抗性基因周围断裂。

BMC Genomics. 2024 Oct 14;25(1):959. doi: 10.1186/s12864-024-10876-0.

Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes.混合宏基因组组装可实现人类微生物组中抗性决定因子和移动元件的高分辨率分析。

Nat Biotechnol. 2019 Aug;37(8):937-944. doi: 10.1038/s41587-019-0191-2. Epub 2019 Jul 29.

Benchmarking genome assembly methods on metagenomic sequencing data.基于宏基因组测序数据对基因组组装方法进行基准测试。

Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad087.

Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences.基于真实和模拟宏基因组序列混合读取的宏基因组组装器评估。

Brief Bioinform. 2020 May 21;21(3):777-790. doi: 10.1093/bib/bbz025.

Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.病毒宏基因组组装中的碎片化和覆盖度变化，及其对多样性计算的影响。

Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015.

Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities.对用于复杂微生物群落宏基因组测序的短读长、长读长和混合读长组装器进行基准测试。

Microbiology (Reading). 2024 Jun;170(6). doi: 10.1099/mic.0.001469.

Critical evaluation of short, long, and hybrid assembly for contextual analysis of antibiotic resistance genes in complex environmental metagenomes.对短序列、长序列和混合组装方法进行批判性评估，以用于复杂环境宏基因组中抗生素抗性基因的语境分析。

Sci Rep. 2021 Feb 12;11(1):3753. doi: 10.1038/s41598-021-83081-8.

metaFlye: scalable long-read metagenome assembly using repeat graphs.metaFlye：使用重复图进行可扩展的长读长宏基因组组装。

Nat Methods. 2020 Nov;17(11):1103-1110. doi: 10.1038/s41592-020-00971-x. Epub 2020 Oct 5.

引用本文的文献

Overcoming challenges in metagenomic AMR surveillance with nanopore sequencing: a case study on fluoroquinolone resistance.利用纳米孔测序克服宏基因组抗菌药物耐药性监测中的挑战：氟喹诺酮耐药性案例研究

Front Microbiol. 2025 Jul 23;16:1614301. doi: 10.3389/fmicb.2025.1614301. eCollection 2025.

ARGContextProfiler: extracting and scoring the genomic contexts of antibiotic resistance genes using assembly graphs.ARG上下文分析器：利用组装图提取抗生素抗性基因的基因组上下文并进行评分。

Front Microbiol. 2025 May 21;16:1604461. doi: 10.3389/fmicb.2025.1604461. eCollection 2025.

zol and fai: large-scale targeted detection and evolutionary investigation of gene clusters.佐尔和法伊：基因簇的大规模靶向检测与进化研究

Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkaf045.

Evaluating the potential of assembler-binner combinations in recovering low-abundance and strain-resolved genomes from human metagenomes.评估组装器-分箱器组合在从人类宏基因组中恢复低丰度和菌株解析基因组方面的潜力。

Heliyon. 2025 Jan 14;11(2):e41938. doi: 10.1016/j.heliyon.2025.e41938. eCollection 2025 Jan 30.

Ultra pure high molecular weight DNA from soil for Nanopore shotgun metagenomics and metabarcoding sequencing.用于纳米孔鸟枪法宏基因组学和代谢条形码测序的超纯高分子量土壤DNA

MethodsX. 2024 Dec 28;14:103134. doi: 10.1016/j.mex.2024.103134. eCollection 2025 Jun.

Metagenomic assemblies tend to break around antibiotic resistance genes.元基因组组装往往在抗生素抗性基因周围断裂。

BMC Genomics. 2024 Oct 14;25(1):959. doi: 10.1186/s12864-024-10876-0.

Bacterial dynamics of the plastisphere microbiome exposed to sub-lethal antibiotic pollution.暴露于亚致死抗生素污染下的质外体微生物组的细菌动态。

Microbiome. 2024 May 24;12(1):97. doi: 10.1186/s40168-024-01803-2.

Enhancing Clinical Utility: Utilization of International Standards and Guidelines for Metagenomic Sequencing in Infectious Disease Diagnosis.提高临床实用性：在传染病诊断中应用宏基因组测序的国际标准和指南。

Int J Mol Sci. 2024 Mar 15;25(6):3333. doi: 10.3390/ijms25063333.

What are the missing pieces needed to stop antibiotic resistance?需要哪些缺失的部分来阻止抗生素耐药性？

Microb Biotechnol. 2023 Oct;16(10):1900-1923. doi: 10.1111/1751-7915.14310. Epub 2023 Jul 7.

本文引用的文献

KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping.KMCP：通过伪映射对原核生物和病毒种群进行准确的宏基因组分析。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac845.

Critical Assessment of Metagenome Interpretation: the second round of challenges.宏基因组解读的关键评估：第二轮挑战。

Nat Methods. 2022 Apr;19(4):429-440. doi: 10.1038/s41592-022-01431-4. Epub 2022 Apr 8.

Inter-species geographic signatures for tracing horizontal gene transfer and long-term persistence of carbapenem resistance.种间地理特征可用于追踪碳青霉烯类耐药基因的水平转移和长期持续存在。

Genome Med. 2022 Apr 5;14(1):37. doi: 10.1186/s13073-022-01040-y.

StrainGE: a toolkit to track and characterize low-abundance strains in complex microbial communities.StrainGE：一个用于追踪和描述复杂微生物群落中低丰度菌株的工具包。

Genome Biol. 2022 Mar 7;23(1):74. doi: 10.1186/s13059-022-02630-0.

Graph-Based Approaches Significantly Improve the Recovery of Antibiotic Resistance Genes From Complex Metagenomic Datasets.基于图谱的方法显著提高了从复杂宏基因组数据集中恢复抗生素抗性基因的能力。

Front Microbiol. 2021 Oct 6;12:714836. doi: 10.3389/fmicb.2021.714836. eCollection 2021.

Functional meta-omics provide critical insights into long- and short-read assemblies.功能宏基因组学为长读长和短读长组装提供了重要的见解。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab330.

Strainberry: automated strain separation in low-complexity metagenomes using long reads.Strainberry：使用长读长进行低复杂度宏基因组中菌株的自动分离。

Nat Commun. 2021 Jul 23;12(1):4485. doi: 10.1038/s41467-021-24515-9.

Metagenomic Data Assembly - The Way of Decoding Unknown Microorganisms.宏基因组数据组装——解码未知微生物的方法

Front Microbiol. 2021 Mar 23;12:613791. doi: 10.3389/fmicb.2021.613791. eCollection 2021.

Sequencing error profiles of Illumina sequencing instruments.Illumina测序仪的测序错误图谱。

NAR Genom Bioinform. 2021 Mar 27;3(1):lqab019. doi: 10.1093/nargab/lqab019. eCollection 2021 Mar.

Sci Rep. 2021 Feb 12;11(1):3753. doi: 10.1038/s41598-021-83081-8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验