Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA, USA.
Department of Genetics, Stanford University, Stanford, CA, USA.
Bioinformatics. 2017 Jul 15;33(14):2082-2088. doi: 10.1093/bioinformatics/btx106.
Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains.
We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects.
Pipeline and analysis code can be downloaded from http://github.com/pachterlab/metakallisto.
读分配是许多宏基因组分析工作流程中的重要第一步,为物种的鉴定和定量提供了基础。然而,许多菌株的序列之间存在歧义,使得在最低分类学水平上难以分配读取,并且读取通常被分配到没有歧义的分类学水平。我们探索了宏基因组读取分配与 RNA-Seq 数据中转录物定量之间的联系,以便开发用于快速准确定量宏基因组菌株的新方法。
我们发现,在 RNA-Seq 上下文中引入的伪比对的最新思想在宏基因组学环境中具有高度的适用性。当与期望最大化 (EM) 算法结合使用时,与当前最先进的软件相比,读取可以更准确和快速地分配,从而首次有可能在宏基因组学项目中分析单个基因组的丰度。
管道和分析代码可从 http://github.com/pachterlab/metakallisto 下载。