Department of Cell Physiology and Metabolism, Faculty of Medicine, Centre Medical Universitaire, 1206, Geneva, Switzerland.
Swiss Institute of Bioinformatics, Geneva, Switzerland.
BMC Bioinformatics. 2020 Jun 22;21(1):257. doi: 10.1186/s12859-020-03585-4.
Metagenomics studies provide valuable insight into the composition and function of microbial populations from diverse environments; however, the data processing pipelines that rely on mapping reads to gene catalogs or genome databases for cultured strains yield results that underrepresent the genes and functional potential of uncultured microbes. Recent improvements in sequence assembly methods have eased the reliance on genome databases, thereby allowing the recovery of genomes from uncultured microbes. However, configuring these tools, linking them with advanced binning and annotation tools, and maintaining provenance of the processing continues to be challenging for researchers.
Here we present ATLAS, a software package for customizable data processing from raw sequence reads to functional and taxonomic annotations using state-of-the-art tools to assemble, annotate, quantify, and bin metagenome data. Abundance estimates at genome resolution are provided for each sample in a dataset. ATLAS is written in Python and the workflow implemented in Snakemake; it operates in a Linux environment, and is compatible with Python 3.5+ and Anaconda 3+ versions. The source code for ATLAS is freely available, distributed under a BSD-3 license.
ATLAS provides a user-friendly, modular and customizable Snakemake workflow for metagenome data processing; it is easily installable with conda and maintained as open-source on GitHub at https://github.com/metagenome-atlas/atlas.
宏基因组学研究为来自不同环境的微生物种群的组成和功能提供了有价值的见解;然而,依赖于将读取映射到基因目录或培养菌株基因组数据库的数据分析处理流程,会导致对未培养微生物的基因和功能潜力的代表性不足。最近序列组装方法的改进减轻了对基因组数据库的依赖,从而允许从未培养的微生物中恢复基因组。然而,对于研究人员来说,配置这些工具、将它们与高级分箱和注释工具连接起来,并保持处理过程的来源仍然是具有挑战性的。
在这里,我们展示了 ATLAS,这是一个软件包,用于使用最先进的工具从原始序列读取到功能和分类注释进行可定制的数据处理,以组装、注释、量化和分箱宏基因组数据。为数据集的每个样本提供了基因组分辨率的丰度估计值。ATLAS 是用 Python 编写的,工作流程是在 Snakemake 中实现的;它在 Linux 环境中运行,与 Python 3.5+和 Anaconda 3+版本兼容。ATLAS 的源代码是免费提供的,根据 BSD-3 许可证分发。
ATLAS 为宏基因组数据分析处理提供了一个用户友好、模块化和可定制的 Snakemake 工作流程;它可以使用 conda 轻松安装,并在 GitHub 上作为开源维护,网址为 https://github.com/metagenome-atlas/atlas。