MetaGen：使用多个宏基因组样本进行无参考学习。

MetaGen: reference-free learning with multiple metagenomic samples.

机构信息

Department of Statistics, University of Georgia, Athens, 30602, GA, USA.

Department of Statistics, Harvard University, Cambridge, 02138, MA, USA.

出版信息

Genome Biol. 2017 Oct 3;18(1):187. doi: 10.1186/s13059-017-1323-y.

A major goal of metagenomics is to identify and study the entire collection of microbial species in a set of targeted samples. We describe a statistical metagenomic algorithm that simultaneously identifies microbial species and estimates their abundances without using reference genomes. As a trade-off, we require multiple metagenomic samples, usually ≥10 samples, to get highly accurate binning results. Compared to reference-free methods based primarily on k-mer distributions or coverage information, the proposed approach achieves a higher species binning accuracy and is particularly powerful when sequencing coverage is low. We demonstrated the performance of this new method through both simulation and real metagenomic studies. The MetaGen software is available at https://github.com/BioAlgs/MetaGen .

宏基因组学的一个主要目标是识别和研究一组目标样本中微生物物种的全部集合。我们描述了一种统计宏基因组算法，该算法可以在不使用参考基因组的情况下同时识别微生物物种并估计它们的丰度。作为一种权衡，我们需要多个宏基因组样本，通常≥10 个样本，才能获得高度准确的分类结果。与主要基于 k-mer 分布或覆盖信息的无参考方法相比，该方法实现了更高的物种分类准确性，并且在测序覆盖度低时特别有效。我们通过模拟和真实的宏基因组研究展示了这种新方法的性能。MetaGen 软件可在 https://github.com/BioAlgs/MetaGen 获得。