Microbial Genomics and Bioinformatics Research Group, Max Planck Institute for Marine Microbiology, Celsiusstraße 1, 28359, Bremen, Germany.
Department of Life Sciences and Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, 28759, Bremen, Germany.
BMC Bioinformatics. 2019 Sep 5;20(1):453. doi: 10.1186/s12859-019-3031-y.
Metagenomics caused a quantum leap in microbial ecology. However, the inherent size and complexity of metagenomic data limit its interpretation. The quantification of metagenomic traits in metagenomic analysis workflows has the potential to improve the exploitation of metagenomic data. Metagenomic traits are organisms' characteristics linked to their performance. They are measured at the genomic level taking a random sample of individuals in a community. As such, these traits provide valuable information to uncover microorganisms' ecological patterns. The Average Genome Size (AGS) and the 16S rRNA gene Average Copy Number (ACN) are two highly informative metagenomic traits that reflect microorganisms' ecological strategies as well as the environmental conditions they inhabit.
Here, we present the ags.sh and acn.sh tools, which analytically derive the AGS and ACN metagenomic traits. These tools represent an advance on previous approaches to compute the AGS and ACN traits. Benchmarking shows that ags.sh is up to 11 times faster than state-of-the-art tools dedicated to the estimation AGS. Both ags.sh and acn.sh show comparable or higher accuracy than existing tools used to estimate these traits. To exemplify the applicability of both tools, we analyzed the 139 prokaryotic metagenomes of TARA Oceans and revealed the ecological strategies associated with different water layers.
We took advantage of recent advances in gene annotation to develop the ags.sh and acn.sh tools to combine easy tool usage with fast and accurate performance. Our tools compute the AGS and ACN metagenomic traits on unassembled metagenomes and allow researchers to improve their metagenomic data analysis to gain deeper insights into microorganisms' ecology. The ags.sh and acn.sh tools are publicly available using Docker container technology at https://github.com/pereiramemo/AGS-and-ACN-tools .
宏基因组学引发了微生物生态学的飞跃。然而,宏基因组数据固有的大小和复杂性限制了其解释。在宏基因组分析工作流程中,对宏基因组特征进行量化有可能提高对宏基因组数据的利用。宏基因组特征是与生物体表现相关的特征,它们是通过对群落中的个体进行随机抽样,在基因组水平上进行测量的。因此,这些特征为揭示微生物的生态模式提供了有价值的信息。平均基因组大小(AGS)和 16S rRNA 基因平均拷贝数(ACN)是两个非常有信息量的宏基因组特征,它们反映了微生物的生态策略以及它们所栖息的环境条件。
在这里,我们提出了 ags.sh 和 acn.sh 工具,它们可以分析得出 AGS 和 ACN 宏基因组特征。这些工具代表了计算 AGS 和 ACN 特征的先前方法的改进。基准测试表明,ags.sh 的速度比专门用于估计 AGS 的最先进工具快 11 倍。ags.sh 和 acn.sh 的准确性都与用于估计这些特征的现有工具相当或更高。为了说明这两个工具的适用性,我们分析了 TARA 海洋的 139 个原核宏基因组,揭示了与不同水层相关的生态策略。
我们利用基因注释的最新进展开发了 ags.sh 和 acn.sh 工具,将易于使用的工具与快速准确的性能相结合。我们的工具可以在未组装的宏基因组上计算 AGS 和 ACN 宏基因组特征,允许研究人员改进他们的宏基因组数据分析,以更深入地了解微生物的生态学。ags.sh 和 acn.sh 工具可通过 Docker 容器技术在 https://github.com/pereiramemo/AGS-and-ACN-tools 上公开获取。