Research Group Bioinformatics (NG4), Robert Koch Institute, Nordufer 20, Berlin, 13353, Germany.
CAPES Foundation, Ministry of Education of Brazil, Brasília, 70040-020, DF, Brazil.
Microbiome. 2017 Aug 14;5(1):101. doi: 10.1186/s40168-017-0318-y.
Many metagenome analysis tools are presently available to classify sequences and profile environmental samples. In particular, taxonomic profiling and binning methods are commonly used for such tasks. Tools available among these two categories make use of several techniques, e.g., read mapping, k-mer alignment, and composition analysis. Variations on the construction of the corresponding reference sequence databases are also common. In addition, different tools provide good results in different datasets and configurations. All this variation creates a complicated scenario to researchers to decide which methods to use. Installation, configuration and execution can also be difficult especially when dealing with multiple datasets and tools.
We propose MetaMeta: a pipeline to execute and integrate results from metagenome analysis tools. MetaMeta provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample. MetaMeta includes a database generation, pre-processing, execution, and integration steps, allowing easy execution and parallelization. The integration relies on the co-occurrence of organisms from different methods as the main feature to improve community profiling while accounting for differences in their databases.
In a controlled case with simulated and real data, we show that the integrated profiles of MetaMeta overcome the best single profile. Using the same input data, it provides more sensitive and reliable results with the presence of each organism being supported by several methods. MetaMeta uses Snakemake and has six pre-configured tools, all available at BioConda channel for easy installation (conda install -c bioconda metameta). The MetaMeta pipeline is open-source and can be downloaded at: https://gitlab.com/rki_bioinformatics .
目前有许多宏基因组分析工具可用于对序列进行分类并对环境样本进行分析。特别是,分类分析和分类学方法常用于此类任务。这两类工具都使用了多种技术,例如读取映射、k-mer 比对和组成分析。相应参考序列数据库构建的变化也很常见。此外,不同的工具在不同的数据集和配置下提供了良好的结果。所有这些变化使得研究人员很难决定使用哪些方法。安装、配置和执行也可能很困难,特别是在处理多个数据集和工具时。
我们提出了 MetaMeta:一个用于执行和整合宏基因组分析工具结果的管道。MetaMeta 提供了一个简单的工作流程,用于对多个样本运行多个工具,为每个样本生成单个增强的输出概况。MetaMeta 包括数据库生成、预处理、执行和集成步骤,允许轻松执行和并行化。该集成依赖于不同方法中生物体的共现作为主要特征,以提高群落分析,同时考虑到它们的数据库差异。
在一个带有模拟和真实数据的受控案例中,我们表明 MetaMeta 的集成概况优于最佳单一概况。使用相同的输入数据,它提供了更敏感和可靠的结果,每个生物体的存在都得到了几种方法的支持。MetaMeta 使用 Snakemake 并具有六个预配置的工具,所有工具均可在 BioConda 频道中轻松安装(conda install -c bioconda metameta)。MetaMeta 管道是开源的,可以在以下网址下载:https://gitlab.com/rki_bioinformatics 。