Department of Computer Science and Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI 48202, USA.
Bioinformatics. 2013 Oct 1;29(19):2395-401. doi: 10.1093/bioinformatics/btt420. Epub 2013 Aug 5.
Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells.
Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach.
Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.
识别微生物样本中存在的每一个基因组是一项重要且具有挑战性的任务,具有关键的应用。这是具有挑战性的,因为在微生物样本中通常有数百万个细胞,其中绝大多数难以培养。迄今为止最准确的方法是使用多次置换扩增进行全面的单细胞测序,而对于大量细胞来说,这是根本无法实现的。然而,有希望打破这一障碍,因为具有不同基因组序列的不同细胞类型的数量通常远小于细胞数量。
在这里,我们提出了一种新的分而治之的方法,用于对微生物样本中存在的所有不同基因组进行测序和从头组装,其测序成本和计算复杂度与基因组类型的数量成正比,而不是与细胞数量成正比。该方法在称为 Squeezambler 的工具中实现。我们在模拟数据上评估了 Squeezambler。与简单的穷举方法相比,所提出的分而治之方法成功降低了测序成本。
Squeezambler 和数据集可在 http://compbio.cs.wayne.edu/software/squeezambler/ 获得。