Department of Computer Science, University of Maryland, College Park, MD 20742, USA.
Bioinformatics. 2011 Nov 1;27(21):2964-71. doi: 10.1093/bioinformatics/btr520. Epub 2011 Sep 16.
Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources.
We present a new scaffolder, Bambus 2, to address some of the challenges encountered when analyzing metagenomes. Our approach relies on a combination of a novel method for detecting genomic repeats and algorithms that analyze assembly graphs to identify biologically meaningful genomic variants. We compare our software to current assemblers using simulated and real data. We demonstrate that the repeat detection algorithms have higher sensitivity than current approaches without sacrificing specificity. In metagenomic datasets, the scaffolder avoids false joins between distantly related organisms while obtaining long-range contiguity. Bambus 2 represents a first step toward automated metagenomic assembly.
Bambus 2 is open source and available from http://amos.sf.net.
Supplementary data are available at Bioinformatics online.
测序项目越来越多地针对非克隆来源的样本。特别是,宏基因组学使科学家能够开始描述微生物群落的结构。然而,为分析克隆生物的测序数据而开发的软件工具无法充分处理来自非克隆来源的数据。
我们提出了一种新的支架,Bambus 2,以解决分析宏基因组时遇到的一些挑战。我们的方法依赖于一种新颖的检测基因组重复的方法和分析组装图以识别有意义的生物基因组变体的算法。我们使用模拟和真实数据将我们的软件与当前的组装器进行比较。我们证明,重复检测算法具有更高的敏感性,而不会牺牲特异性。在宏基因组数据集中,支架避免了在远缘生物之间的错误连接,同时获得了长程连续性。Bambus 2 代表了自动化宏基因组组装的第一步。
Bambus 2 是开源的,并可从 http://amos.sf.net 获得。
补充数据可在Bioinformatics 在线获得。