Youngblut Nicholas D, Ley Ruth E
Microbiome Science, Max Planck Institute for Developmental Biology, Tuebingen, Baden Wurttemberg, Germany.
PeerJ. 2021 Sep 16;9:e12198. doi: 10.7717/peerj.12198. eCollection 2021.
Mapping metagenome reads to reference databases is the standard approach for assessing microbial taxonomic and functional diversity from metagenomic data. However, public reference databases often lack recently generated genomic data such as metagenome-assembled genomes (MAGs), which can limit the sensitivity of read-mapping approaches. We previously developed the Struo pipeline in order to provide a straight-forward method for constructing custom databases; however, the pipeline does not scale well enough to cope with the ever-increasing number of publicly available microbial genomes. Moreover, the pipeline does not allow for efficient database updating as new data are generated. To address these issues, we developed Struo2, which is >3.5 fold faster than Struo at database generation and can also efficiently update existing databases. We also provide custom Kraken2, Bracken, and HUMAnN3 databases that can be easily updated with new genomes and/or individual gene sequences. Efficient database updating, coupled with our pre-generated databases, enables "assembly-enhanced" profiling, which increases database comprehensiveness via inclusion of native genomic content. Inclusion of newly generated genomic content can greatly increase database comprehensiveness, especially for understudied biomes, which will enable more accurate assessments of microbiome diversity.
将宏基因组读数映射到参考数据库是从宏基因组数据评估微生物分类和功能多样性的标准方法。然而,公共参考数据库往往缺乏最近生成的基因组数据,如宏基因组组装基因组(MAG),这可能会限制读数映射方法的灵敏度。我们之前开发了Struo流程,以便提供一种构建定制数据库的直接方法;然而,该流程扩展性不足,无法应对公开可用微生物基因组数量的不断增加。此外,随着新数据的生成,该流程不允许进行高效的数据库更新。为了解决这些问题,我们开发了Struo2,它在数据库生成方面比Struo快3.5倍以上,并且还可以高效更新现有数据库。我们还提供了定制的Kraken2、Bracken和HUMAnN3数据库,这些数据库可以轻松地用新基因组和/或单个基因序列进行更新。高效的数据库更新,再加上我们预先生成的数据库,实现了“组装增强”分析,通过纳入本地基因组内容提高了数据库的全面性。纳入新生成的基因组内容可以大大提高数据库的全面性,特别是对于研究不足的生物群落,这将使对微生物组多样性的评估更加准确。