对土壤微生物群落宏基因组分类器的深入评估。

An in-depth evaluation of metagenomic classifiers for soil microbiomes.

作者信息

Edwin Niranjana Rose, Fitzpatrick Amy Heather, Brennan Fiona, Abram Florence, O'Sullivan Orla

机构信息

Teagasc, Moorepark Food Research Centre, Moorepark, Fermoy, Cork, Ireland.

Functional Environmental Microbiology, School of Biological and Chemical Sciences, Ryan Institute, University of Galway, Galway, Ireland.

出版信息

Environ Microbiome. 2024 Mar 28;19(1):19. doi: 10.1186/s40793-024-00561-w.

DOI:10.1186/s40793-024-00561-w

PMID:38549112

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10979606/

Abstract

BACKGROUND

Recent endeavours in metagenomics, exemplified by projects such as the human microbiome project and TARA Oceans, have illuminated the complexities of microbial biomes. A robust bioinformatic pipeline and meticulous evaluation of their methodology have contributed to the success of these projects. The soil environment, however, with its unique challenges, requires a specialized methodological exploration to maximize microbial insights. A notable limitation in soil microbiome studies is the dearth of soil-specific reference databases available to classifiers that emulate the complexity of soil communities. There is also a lack of in-vitro mock communities derived from soil strains that can be assessed for taxonomic classification accuracy.

RESULTS

In this study, we generated a custom in-silico mock community containing microbial genomes commonly observed in the soil microbiome. Using this mock community, we simulated shotgun sequencing data to evaluate the performance of three leading metagenomic classifiers: Kraken2 (supplemented with Bracken, using a custom database derived from GTDB-TK genomes along with its own default database), Kaiju, and MetaPhlAn, utilizing their respective default databases for a robust analysis. Our results highlight the importance of optimizing taxonomic classification parameters, database selection, as well as analysing trimmed reads and contigs. Our study showed that classifiers tailored to the specific taxa present in our samples led to fewer errors compared to broader databases including microbial eukaryotes, protozoa, or human genomes, highlighting the effectiveness of targeted taxonomic classification. Notably, an optimal classifier performance was achieved when applying a relative abundance threshold of 0.001% or 0.005%. The Kraken2 supplemented with bracken, with a custom database demonstrated superior precision, sensitivity, F1 score, and overall sequence classification. Using a custom database, this classifier classified 99% of in-silico reads and 58% of real-world soil shotgun reads, with the latter identifying previously overlooked phyla using a custom database.

CONCLUSION

This study underscores the potential advantages of in-silico methodological optimization in metagenomic analyses, especially when deciphering the complexities of soil microbiomes. We demonstrate that the choice of classifier and database significantly impacts microbial taxonomic profiling. Our findings suggest that employing Kraken2 with Bracken, coupled with a custom database of GTDB-TK genomes and fungal genomes at a relative abundance threshold of 0.001% provides optimal accuracy in soil shotgun metagenome analysis.

摘要

背景

宏基因组学领域的最新研究成果，如人类微生物组计划和塔拉海洋项目，揭示了微生物群落的复杂性。强大的生物信息学流程和对其方法的细致评估促成了这些项目的成功。然而，土壤环境因其独特的挑战，需要专门的方法探索以最大化对微生物的认识。土壤微生物组研究的一个显著局限是缺乏可供分类器使用的、能模拟土壤群落复杂性的土壤特异性参考数据库。此外，也缺乏源自土壤菌株的体外模拟群落，无法用于评估分类准确性。

结果

在本研究中，我们生成了一个包含土壤微生物组中常见微生物基因组的定制虚拟模拟群落。利用这个模拟群落，我们模拟了鸟枪法测序数据，以评估三种领先的宏基因组分类器的性能：Kraken2（辅以Bracken，使用从GTDB-TK基因组衍生的定制数据库及其自身的默认数据库）、Kaiju和MetaPhlAn，并利用它们各自的默认数据库进行全面分析。我们的结果突出了优化分类参数、数据库选择以及分析修剪后的读段和重叠群的重要性。我们的研究表明，与包含微生物真核生物、原生动物或人类基因组的更广泛数据库相比，针对我们样本中存在的特定分类群定制的分类器导致的错误更少，这突出了靶向分类的有效性。值得注意的是，当应用0.001%或0.005%的相对丰度阈值时，可实现最佳分类器性能。辅以Bracken的Kraken2，使用定制数据库，展现出卓越的精度、灵敏度、F1分数和整体序列分类能力。使用定制数据库，该分类器对99%的虚拟读段和58%的实际土壤鸟枪法读段进行了分类，后者使用定制数据库识别出了之前被忽视的门。

结论

本研究强调了在宏基因组分析中进行虚拟方法优化的潜在优势，尤其是在解读土壤微生物组的复杂性时。我们证明了分类器和数据库的选择对微生物分类谱分析有显著影响。我们的研究结果表明，在相对丰度阈值为0.001%时，使用Kraken2和Bracken，并结合GTDB-TK基因组和真菌基因组的定制数据库，可在土壤鸟枪法宏基因组分析中提供最佳准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/26f2/10979606/5a136975f3d2/40793_2024_561_Fig1_HTML.jpg

相似文献

An in-depth evaluation of metagenomic classifiers for soil microbiomes.对土壤微生物群落宏基因组分类器的深入评估。

Environ Microbiome. 2024 Mar 28;19(1):19. doi: 10.1186/s40793-024-00561-w.

From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools.从默认值到数据库：参数和数据库的选择极大地影响了宏基因组分类工具的性能。

Microb Genom. 2023 Mar;9(3). doi: 10.1099/mgen.0.000949.

Struo2: efficient metagenome profiling database construction for ever-expanding microbial genome datasets.Struo2：为不断扩展的微生物基因组数据集构建高效的宏基因组分析数据库。

PeerJ. 2021 Sep 16;9:e12198. doi: 10.7717/peerj.12198. eCollection 2021.

Contigs directed gene annotation (ConDiGA) for accurate protein sequence database construction in metaproteomics.宏基因组学中用于准确蛋白质序列数据库构建的 Contigs 定向基因注释（ConDiGA）。

Microbiome. 2024 Mar 19;12(1):58. doi: 10.1186/s40168-024-01775-3.

Efficient and Quality-Optimized Metagenomic Pipeline Designed for Taxonomic Classification in Routine Microbiological Clinical Tests.为常规微生物临床检测中的分类学分类设计的高效且质量优化的宏基因组流程。

Microorganisms. 2022 Mar 25;10(4):711. doi: 10.3390/microorganisms10040711.

Benchmarking taxonomic classifiers with Illumina and Nanopore sequence data for clinical metagenomic diagnostic applications.使用 Illumina 和 Nanopore 测序数据对临床宏基因组诊断应用进行分类器的基准测试。

Microb Genom. 2022 Oct;8(10). doi: 10.1099/mgen.0.000886.

Cultivation-independent genomes greatly expand taxonomic-profiling capabilities of mOTUs across various environments.非培养基因组极大地扩展了 mOTU 在各种环境中的分类鉴定能力。

Microbiome. 2022 Dec 5;10(1):212. doi: 10.1186/s40168-022-01410-z.

Shotgun metagenomics of soil invertebrate communities reflects taxonomy, biomass, and reference genome properties.土壤无脊椎动物群落的鸟枪法宏基因组学反映了分类学、生物量和参考基因组特性。

Ecol Evol. 2022 Jun 6;12(6):e8991. doi: 10.1002/ece3.8991. eCollection 2022 Jul.

Evaluation of methods for the reduction of contaminating host reads when performing shotgun metagenomic sequencing of the milk microbiome.评价在对奶微生物组进行鸟枪法宏基因组测序时减少污染宿主reads 的方法。

Sci Rep. 2020 Dec 10;10(1):21665. doi: 10.1038/s41598-020-78773-6.

Species classifier choice is a key consideration when analysing low-complexity food microbiome data.在分析低复杂度食品微生物组数据时，物种分类器的选择是一个关键考虑因素。

Microbiome. 2018 Mar 20;6(1):50. doi: 10.1186/s40168-018-0437-0.

引用本文的文献

Comparative evaluation of sequencing platforms: Pacific Biosciences, Oxford Nanopore Technologies, and Illumina for 16S rRNA-based soil microbiome profiling.测序平台的比较评估：用于基于16S rRNA的土壤微生物群落分析的太平洋生物科学公司、牛津纳米孔技术公司和Illumina平台

Front Microbiol. 2025 Aug 6;16:1633360. doi: 10.3389/fmicb.2025.1633360. eCollection 2025.

Testing the limits of short-reads metagenomic classifications programs in wastewater treating microbial communities.测试短读长宏基因组分类程序在废水处理微生物群落中的极限。

Sci Rep. 2025 Jul 5;15(1):23997. doi: 10.1038/s41598-025-07734-8.

Consistent microbial insights across sequencing methods in soil studies: the role of reference taxonomies.土壤研究中不同测序方法的一致微生物见解：参考分类法的作用

mSystems. 2025 Jul 22;10(7):e0105924. doi: 10.1128/msystems.01059-24. Epub 2025 Jun 10.

Metataxonomics Characterization of Soil Microbiome Extraction Method Using Different Dispersant Solutions.使用不同分散剂溶液对土壤微生物群落提取方法的宏分类学特征分析

Microorganisms. 2025 Apr 18;13(4):936. doi: 10.3390/microorganisms13040936.

Evaluating the potential of assembler-binner combinations in recovering low-abundance and strain-resolved genomes from human metagenomes.评估组装器-分箱器组合在从人类宏基因组中恢复低丰度和菌株解析基因组方面的潜力。

Heliyon. 2025 Jan 14;11(2):e41938. doi: 10.1016/j.heliyon.2025.e41938. eCollection 2025 Jan 30.

Variability of microbiomes in winter rye, wheat, and triticale affected by snow mold: predicting promising microorganisms for the disease control.受雪腐病影响的冬黑麦、小麦和小黑麦微生物群的变异性：预测用于病害控制的有前景微生物。

Environ Microbiome. 2025 Jan 11;20(1):3. doi: 10.1186/s40793-025-00665-x.

Evaluating metagenomic analyses for undercharacterized environments: what's needed to light up the microbial dark matter?评估针对特征描述不足环境的宏基因组分析：照亮微生物暗物质需要什么？

bioRxiv. 2024 Nov 9:2024.11.08.622677. doi: 10.1101/2024.11.08.622677.

本文引用的文献

The complete sequence of a human Y chromosome.人类 Y 染色体的完整序列。

Nature. 2023 Sep;621(7978):344-354. doi: 10.1038/s41586-023-06457-y. Epub 2023 Aug 23.

Proposal of names for 329 higher rank taxa defined in the Genome Taxonomy Database under two prokaryotic codes.提议为基因组分类数据库中两个原核代码定义的 329 个高级分类群的名称。

FEMS Microbiol Lett. 2023 Jan 17;370. doi: 10.1093/femsle/fnad071.

Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4.利用 MetaPhlAn 4 对未鉴定物种进行宏基因组分类分析的扩展和改进。

Nat Biotechnol. 2023 Nov;41(11):1633-1644. doi: 10.1038/s41587-023-01688-w. Epub 2023 Feb 23.

Benchmarking Bioinformatic Tools for Amplicon-Based Sequencing of Norovirus.基于扩增子的诺如病毒测序的生物信息学工具的基准测试。

Appl Environ Microbiol. 2023 Jan 31;89(1):e0152222. doi: 10.1128/aem.01522-22. Epub 2022 Dec 21.

Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets.评价长读 shotgun 宏基因组测序数据集的分类和分析方法。

BMC Bioinformatics. 2022 Dec 13;23(1):541. doi: 10.1186/s12859-022-05103-0.

Microb Genom. 2022 Oct;8(10). doi: 10.1099/mgen.0.000886.

SeqCode: a nomenclatural code for prokaryotes described from sequence data.序列码：一种基于序列数据描述的原核生物命名代码。

Nat Microbiol. 2022 Oct;7(10):1702-1708. doi: 10.1038/s41564-022-01214-9. Epub 2022 Sep 19.

Mobilome-driven segregation of the resistome in biological wastewater treatment.移动元件驱动生物废水处理中耐药组的分离。

Elife. 2022 Sep 16;11:e81196. doi: 10.7554/eLife.81196.

To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences.比较使用经过整理的 16S 全长 rRNA 序列的原核分类器的性能。

Comput Biol Med. 2022 Jun;145:105416. doi: 10.1016/j.compbiomed.2022.105416. Epub 2022 Mar 17.

Disturbance alters the forest soil microbiome.干扰会改变森林土壤微生物组。

Mol Ecol. 2022 Jan;31(2):419-447. doi: 10.1111/mec.16242. Epub 2021 Nov 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对土壤微生物群落宏基因组分类器的深入评估。

An in-depth evaluation of metagenomic classifiers for soil microbiomes.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献