Systems Biology Programme, Centro Nacional de Biotecnología (CNB-CSIC). C/Darwin 3, 28049 Madrid, Spain.
Environ Microbiol Rep. 2012 Jun;4(3):335-41. doi: 10.1111/j.1758-2229.2012.00338.x. Epub 2012 Apr 17.
In any metagenomic project, the coverage obtained for each particular species depends on its abundance. This makes it difficult to determine a priori the amount of DNA sequencing necessary to obtain a high coverage for the dominant genomes in an environment. To aid the design of metagenomic sequencing projects, we have developed COVER, a web-based tool that allows the estimation of the coverage achieved for each species in an environmental sample. COVER uses a set of 16S rRNA sequences to produce an estimate of the number of operational taxonomic units (OTUs) in the sample, provides a taxonomic assignment for them, estimates their genome sizes and, most critically, corrects for the number of unobserved OTUs. COVER then calculates the amount of sequencing needed to achieve a given goal. Our tests and simulations indicate that the results obtained through COVER are in very good agreement with the experimental results.
在任何宏基因组学项目中,获得的每个特定物种的覆盖率取决于其丰度。这使得很难事先确定获得环境中主要基因组的高覆盖率所需的 DNA 测序量。为了帮助设计宏基因组测序项目,我们开发了 COVER,这是一个基于网络的工具,允许估计环境样本中每个物种的覆盖率。COVER 使用一组 16S rRNA 序列来估计样本中的操作分类单元 (OTU) 的数量,为它们提供分类分配,估计它们的基因组大小,最重要的是,纠正未观察到的 OTU 的数量。COVER 然后计算实现给定目标所需的测序量。我们的测试和模拟表明,通过 COVER 获得的结果与实验结果非常吻合。