Suppr超能文献

分析短读分类工具在宏基因组样本上的表现,以实现对疾病的正确诊断。

Analyzing the performance of short-read classification tools on metagenomic samples toward proper diagnosis of diseases.

机构信息

Computer Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran.

Department of Neurology, University of California Irvine, CA, USA.

出版信息

J Bioinform Comput Biol. 2024 Oct;22(5):2450012. doi: 10.1142/S0219720024500124. Epub 2024 Sep 17.

Abstract

Accurate knowledge of the genome, virus and bacteria that have invaded our bodies is crucial for diagnosing many human diseases. The field of bioinformatics encompasses the complex computational methods required for this purpose. Metagenomics employs next-generation sequencing (NGS) technology to study and identify microbial communities in environmental samples. This technique allows for the measurement of the relative abundance of different microbes. Various tools are available for detecting bacterial species in sequenced metagenomic samples. In this study, we focus on well-known taxonomic classification tools such as MetaPhlAn4, Centrifuge, Kraken2, and Bracken, and evaluate their performance at the species level using synthetic and real datasets. The results indicate that MetaPhlAn4 exhibited high precision in identifying species in the simulated dataset, while Kraken2 had the best area under the precision-recall curve (AUPR) performance. Centrifuge, Kraken2, and Bracken showed accurate estimation of species abundances, unlike MetaPhlAn4, which had a higher L2 distance. In the real dataset analysis with samples from an inflammatory bowel disease (IBD) research, MetaPhlAn4, and Kraken2 had faster execution times, with differences in performance at family and species levels among the tools. and were highlighted as the most abundant families by Centrifuge, Kraken2, and MetaPhlAn4, with variations in abundance among ulcerative colitis (UC), Crohn's disease (CD), and control non-IBD (CN) groups. () has the highest abundance among species in the CD and UC groups in comparison with the CN group. Bracken overestimated abundance, emphasizing result interpretation caution. The findings of this research can assist in selecting the appropriate short-read classifier, thereby aiding in the diagnosis of target diseases.

摘要

准确了解入侵人体的基因组、病毒和细菌对于诊断许多人类疾病至关重要。生物信息学领域涵盖了为此目的所需的复杂计算方法。宏基因组学采用下一代测序 (NGS) 技术来研究和识别环境样本中的微生物群落。该技术允许测量不同微生物的相对丰度。有各种工具可用于检测测序宏基因组样本中的细菌物种。在这项研究中,我们专注于众所周知的分类学分类工具,如 MetaPhlAn4、Centrifuge、Kraken2 和 Bracken,并使用合成和真实数据集评估它们在物种水平上的性能。结果表明,MetaPhlAn4 在识别模拟数据集中的物种方面表现出高精度,而 Kraken2 在精度-召回曲线下面积 (AUPR) 性能方面表现最佳。Centrifuge、Kraken2 和 Bracken 表现出对物种丰度的准确估计,而 MetaPhlAn4 的 L2 距离较高。在炎症性肠病 (IBD) 研究样本的真实数据集分析中,MetaPhlAn4 和 Kraken2 的执行时间更快,工具之间在科和种水平上的性能存在差异。Centrifuge、Kraken2 和 MetaPhlAn4 突出显示为最丰富的科,溃疡性结肠炎 (UC)、克罗恩病 (CD) 和对照非 IBD (CN) 组之间的丰度存在差异。与 CN 组相比,CD 和 UC 组中 物种的丰度最高。Bracken 高估了 的丰度,强调了结果解释的谨慎性。这项研究的结果可以帮助选择适当的短读分类器,从而有助于诊断目标疾病。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验