Department of Earth and Planetary Science, University of California, Berkeley, 307 McCone Hall #4767, Berkeley, CA 94720, USA.
Genome Biol. 2011;12(5):R44. doi: 10.1186/gb-2011-12-5-r44. Epub 2011 May 19.
Recovery of ribosomal small subunit genes by assembly of short read community DNA sequence data generally fails, making taxonomic characterization difficult. Here, we solve this problem with a novel iterative method, based on the expectation maximization algorithm, that reconstructs full-length small subunit gene sequences and provides estimates of relative taxon abundances. We apply the method to natural and simulated microbial communities, and correctly recover community structure from known and previously unreported rRNA gene sequences. An implementation of the method is freely available at https://github.com/csmiller/EMIRGE.
通过组装短读社区 DNA 序列数据来恢复核糖体小亚基基因通常会失败,这使得分类鉴定变得困难。在这里,我们基于期望最大化算法提出了一种新的迭代方法来解决这个问题,该方法可以重建全长小亚基基因序列,并提供相对分类丰度的估计。我们将该方法应用于自然和模拟微生物群落,并从已知和以前未报道的 rRNA 基因序列中正确恢复群落结构。该方法的实现可在 https://github.com/csmiller/EMIRGE 上免费获得。