Suppr超能文献

众包分类学宏基因组分析器的基准测试:从 sbv IMPROVER 微生物组学挑战赛中吸取的经验教训。

Crowdsourced benchmarking of taxonomic metagenome profilers: lessons learned from the sbv IMPROVER Microbiomics challenge.

机构信息

PMI R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000, Neuchâtel, Switzerland.

Data Science and Informatics, Corteva Agrisciences, Ascendas IT Park, Madhapur, Hyderabad, 500081, India.

出版信息

BMC Genomics. 2022 Aug 30;23(1):624. doi: 10.1186/s12864-022-08803-2.

Abstract

BACKGROUND

Selection of optimal computational strategies for analyzing metagenomics data is a decisive step in determining the microbial composition of a sample, and this procedure is complex because of the numerous tools currently available. The aim of this research was to summarize the results of crowdsourced sbv IMPROVER Microbiomics Challenge designed to evaluate the performance of off-the-shelf metagenomics software as well as to investigate the robustness of these results by the extended post-challenge analysis. In total 21 off-the-shelf taxonomic metagenome profiling pipelines were benchmarked for their capacity to identify the microbiome composition at various taxon levels across 104 shotgun metagenomics datasets of bacterial genomes (representative of various microbiome samples) from public databases. Performance was determined by comparing predicted taxonomy profiles with the gold standard.

RESULTS

Most taxonomic profilers performed homogeneously well at the phylum level but generated intermediate and heterogeneous scores at the genus and species levels, respectively. kmer-based pipelines using Kraken with and without Bracken or using CLARK-S performed best overall, but they exhibited lower precision than the two marker-gene-based methods MetaPhlAn and mOTU. Filtering out the 1% least abundance species-which were not reliably predicted-helped increase the performance of most profilers by increasing precision but at the cost of recall. However, the use of adaptive filtering thresholds determined from the sample's Shannon index increased the performance of most kmer-based profilers while mitigating the tradeoff between precision and recall.

CONCLUSIONS

kmer-based metagenomic pipelines using Kraken/Bracken or CLARK-S performed most robustly across a large variety of microbiome datasets. Removing non-reliably predicted low-abundance species by using diversity-dependent adaptive filtering thresholds further enhanced the performance of these tools. This work demonstrates the applicability of computational pipelines for accurately determining taxonomic profiles in clinical and environmental contexts and exemplifies the power of crowdsourcing for unbiased evaluation.

摘要

背景

选择最佳的计算策略来分析宏基因组数据是确定样本微生物组成的关键步骤,而由于目前可用的工具众多,这个过程非常复杂。本研究旨在总结众包 sbv IMPROVER 微生物组挑战赛的结果,该挑战赛旨在评估现成的宏基因组软件的性能,并通过扩展的赛后分析来研究这些结果的稳健性。总共对 21 种现成的分类宏基因组分析管道进行了基准测试,以评估它们在 104 个来自公共数据库的细菌基因组(代表各种微生物样本)的宏基因组数据集上识别不同分类水平微生物组组成的能力。性能通过比较预测的分类学图谱与金标准来确定。

结果

大多数分类分析器在门水平上表现出均匀的性能,但在属和种水平上分别产生中等和异质的评分。基于 kmer 的管道使用带有和不带有 Bracken 的 Kraken 或使用 CLARK-S 总体表现最佳,但它们的精度低于基于两种标记基因的方法 MetaPhlAn 和 mOTU。过滤掉 1%的最少量物种(无法可靠预测)有助于通过提高精度来提高大多数分析器的性能,但以召回率为代价。然而,使用从样本 Shannon 指数确定的自适应过滤阈值增加了大多数基于 kmer 的分析器的性能,同时减轻了精度和召回率之间的权衡。

结论

使用 Kraken/Bracken 或 CLARK-S 的基于 kmer 的宏基因组管道在大量微生物组数据集上表现最为稳健。通过使用基于多样性的自适应过滤阈值去除不可靠预测的低丰度物种,进一步提高了这些工具的性能。这项工作证明了计算管道在临床和环境背景下准确确定分类学图谱的适用性,并展示了众包在无偏评估方面的力量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e96c/9429340/5972481c6233/12864_2022_8803_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验