Olm Matthew R, Brown Christopher T, Brooks Brandon, Banfield Jillian F
Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA.
Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA.
ISME J. 2017 Dec;11(12):2864-2868. doi: 10.1038/ismej.2017.126. Epub 2017 Jul 25.
The number of microbial genomes sequenced each year is expanding rapidly, in part due to genome-resolved metagenomic studies that routinely recover hundreds of draft-quality genomes. Rapid algorithms have been developed to comprehensively compare large genome sets, but they are not accurate with draft-quality genomes. Here we present dRep, a program that reduces the computational time for pairwise genome comparisons by sequentially applying a fast, inaccurate estimation of genome distance, and a slow, accurate measure of average nucleotide identity. dRep achieves a 28 × increase in speed with perfect recall and precision when benchmarked against previously developed algorithms. We demonstrate the use of dRep for genome recovery from time-series datasets. Each metagenome was assembled separately, and dRep was used to identify groups of essentially identical genomes and select the best genome from each replicate set. This resulted in recovery of significantly more and higher-quality genomes compared to the set recovered using co-assembly.
每年测序的微生物基因组数量正在迅速增加,部分原因是基于基因组解析的宏基因组学研究,这类研究通常能获得数百个草图质量的基因组。人们已经开发出快速算法来全面比较大型基因组集,但这些算法对于草图质量的基因组并不准确。在此,我们展示了dRep程序,该程序通过依次应用快速但不准确的基因组距离估计和缓慢但准确的平均核苷酸同一性测量方法,减少了成对基因组比较的计算时间。与先前开发的算法相比,dRep在基准测试中实现了28倍的速度提升,同时具有完美的召回率和精确率。我们展示了dRep在从时间序列数据集中恢复基因组方面的应用。每个宏基因组都单独进行组装,dRep用于识别基本相同的基因组组,并从每个重复组中选择最佳基因组。与使用联合组装恢复的基因组相比,这导致恢复了数量显著更多且质量更高的基因组。