Department of Bioinformatics, Institute for Microbiology and Genetics, Georg-August University, Göttingen, Germany.
Bioinformatics. 2011 Jun 15;27(12):1618-24. doi: 10.1093/bioinformatics/btr266. Epub 2011 May 5.
Inferring the taxonomic profile of a microbial community from a large collection of anonymous DNA sequencing reads is a challenging task in metagenomics. Because existing methods for taxonomic profiling of metagenomes are all based on the assignment of fragmentary sequences to phylogenetic categories, the accuracy of results largely depends on fragment length. This dependence complicates comparative analysis of data originating from different sequencing platforms or resulting from different preprocessing pipelines.
We here introduce a new method for taxonomic profiling based on mixture modeling of the overall oligonucleotide distribution of a sample. Our results indicate that the mixture-based profiles compare well with taxonomic profiles obtained with other methods. However, in contrast to the existing methods, our approach shows a nearly constant profiling accuracy across all kinds of read lengths and it operates at an unrivaled speed.
A platform-independent implementation of the mixture modeling approach is available in terms of a MATLAB/Octave toolbox at http://gobics.de/peter/taxy. In addition, a prototypical implementation within an easy-to-use interactive tool for Windows can be downloaded.
从大量匿名 DNA 测序读取中推断微生物群落的分类分布是宏基因组学中的一项具有挑战性的任务。由于现有的宏基因组分类分析方法都是基于将不完整的序列分配到系统发育类别中,因此结果的准确性在很大程度上取决于片段长度。这种依赖性使得来自不同测序平台或源自不同预处理管道的数据的比较分析变得复杂。
我们在这里介绍了一种新的分类分析方法,该方法基于对样本中整体寡核苷酸分布的混合建模。我们的结果表明,基于混合的分布与其他方法获得的分类分布很好地吻合。然而,与现有的方法不同,我们的方法在所有类型的读取长度上都具有几乎恒定的分析精度,并且速度非常快。
混合建模方法的独立于平台的实现可以在 MATLAB/Octave 工具箱中获得,网址为 http://gobics.de/peter/taxy。此外,还可以下载一个易于使用的 Windows 交互式工具的原型实现。