Suppr超能文献

使用分类群特异性参考数据库会影响宏基因组分类。

The use of taxon-specific reference databases compromises metagenomic classification.

机构信息

Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia.

Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia.

出版信息

BMC Genomics. 2020 Feb 27;21(1):184. doi: 10.1186/s12864-020-6592-2.

Abstract

A recent article in BMC Genomics describes a new bioinformatics tool, HumanMycobiomeScan, to classify fungal taxa in metagenomic samples. This tool was used to characterize the gut mycobiome of hunter-gatherers and Western populations, resulting in the identification of a range of fungal species in the vast majority of samples. In the HumanMycobiomeScan pipeline, sequence reads are mapped against a reference database containing fungal genome sequences only. We argue that using reference databases comprised of a single taxonomic group leads to an unacceptably high number of false-positives due to: (i) mapping to conserved genetic regions in reference genomes, and (ii) sequence contamination in the assembled reference genomes. To demonstrate this, we replaced the HumanMycobiomeScan's fungal reference database with one containing genome sequences of amphibians and reptiles and re-analysed their case study. The classification pipeline recovered all species present in the reference database, revealing turtles (Geoemydidae), bull frogs (Pyxicephalidae) and snakes (Colubridae) as the most abundant herpetological taxa in the human gut. We also re-analysed their case study using a kingdom-agnostic pipeline. This revealed that while the gut of hunter-gatherers and Western subjects may be colonized by a range of microbial eukaryotes, only three fungal families were retrieved. These results highlight the pitfalls of using taxon-specific reference databases for metagenome classification, even when they are comprised of curated whole genome data. We propose that databases containing all domains of life provide the most suitable option for metagenomic species profiling, especially when targeting microbial eukaryotes.

摘要

最近在 BMC 基因组学杂志上的一篇文章描述了一种新的生物信息学工具,名为 HumanMycobiomeScan,用于对宏基因组样本中的真菌分类群进行分类。该工具用于描述狩猎采集者和西方人群的肠道真菌组,结果在绝大多数样本中鉴定出了一系列真菌物种。在 HumanMycobiomeScan 管道中,序列读取与仅包含真菌基因组序列的参考数据库进行比对。我们认为,由于以下原因,使用仅由单一分类群组成的参考数据库会导致不可接受数量的假阳性:(i) 与参考基因组中保守的遗传区域进行映射,以及 (ii) 组装参考基因组中的序列污染。为了证明这一点,我们用包含两栖动物和爬行动物基因组序列的数据库替换了 HumanMycobiomeScan 的真菌参考数据库,并重新分析了他们的案例研究。分类管道恢复了参考数据库中存在的所有物种,揭示了海龟(地龟科)、牛蛙(树蛙科)和蛇(游蛇科)是人类肠道中最丰富的爬行动物类群。我们还使用了一种不考虑生物分类的管道重新分析了他们的案例研究。这表明,尽管狩猎采集者和西方人群的肠道可能被一系列微生物真核生物定植,但只检索到三个真菌科。这些结果强调了即使使用包含经过策展的全基因组数据的分类群特异性参考数据库进行宏基因组分类也存在的陷阱。我们提出,包含所有生命领域的数据库是最适合宏基因组物种分析的选择,特别是当针对微生物真核生物时。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/713f/7045516/523b825294f2/12864_2020_6592_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验