Computational Biology Group, Illumina Cambridge Ltd,, Chesterford Research Park, Little Chesterford, Essex, United Kingdom.
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-14-S5-S2. Epub 2013 Apr 10.
Environmental shotgun sequencing (ESS) has potential to give greater insight into microbial communities than targeted sequencing of 16S regions, but requires much higher sequence coverage. The advent of next-generation sequencing has made it feasible for the Human Microbiome Project and other initiatives to generate ESS data on a large scale, but computationally efficient methods for analysing such data sets are needed.Here we present metaBEETL, a fast taxonomic classifier for environmental shotgun sequences. It uses a Burrows-Wheeler Transform (BWT) index of the sequencing reads and an indexed database of microbial reference sequences. Unlike other BWT-based tools, our method has no upper limit on the number or the total size of the reference sequences in its database. By capturing sequence relationships between strains, our reference index also allows us to classify reads which are not unique to an individual strain but are nevertheless specific to some higher phylogenetic order.Tested on datasets with known taxonomic composition, metaBEETL gave results that are competitive with existing similarity-based tools: due to normalization steps which other classifiers lack, the taxonomic profile computed by metaBEETL closely matched the true environmental profile. At the same time, its moderate running time and low memory footprint allow metaBEETL to scale well to large data sets.Code to construct the BWT indexed database and for the taxonomic classification is part of the BEETL library, available as a github repository at git@github.com:BEETL/BEETL.git.
环境 shotgun 测序(ESS)有潜力比靶向 16S 区域测序更深入地了解微生物群落,但需要更高的序列覆盖率。下一代测序的出现使得人类微生物组计划和其他计划能够大规模生成 ESS 数据,但需要计算效率高的方法来分析此类数据集。
在这里,我们提出了 metaBEETL,一种用于环境 shotgun 序列的快速分类器。它使用测序reads 的 Burrows-Wheeler Transform (BWT) 索引和微生物参考序列的索引数据库。与其他基于 BWT 的工具不同,我们的方法对其数据库中的参考序列的数量或总大小没有上限。通过捕获菌株之间的序列关系,我们的参考索引还允许我们对不属于单个菌株但仍属于某些更高系统发育阶元的 reads 进行分类。
在具有已知分类组成的数据集上进行测试,metaBEETL 的结果与现有的基于相似性的工具具有竞争力:由于其他分类器缺乏归一化步骤,metaBEETL 计算的分类特征与真实环境特征非常吻合。同时,其适度的运行时间和低内存占用允许 metaBEETL 很好地扩展到大型数据集。
构建 BWT 索引数据库和进行分类的代码是 BEETL 库的一部分,可在 github 存储库 git@github.com:BEETL/BEETL.git 中获得。