Perisin Matthew, Vetter Madlen, Gilbert Jack A, Bergelson Joy
Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA.
Committee on Microbiology, University of Chicago, Chicago, IL, USA.
ISME J. 2016 Apr;10(4):1020-4. doi: 10.1038/ismej.2015.161. Epub 2015 Sep 11.
The 16S rRNA gene (16S) is an accepted marker of bacterial taxonomic diversity, even though differences in copy number obscure the relationship between amplicon and organismal abundances. Ancestral state reconstruction methods can predict 16S copy numbers through comparisons with closely related reference genomes; however, the database of closed genomes is limited. Here, we extend the reference database of 16S copy numbers to de novo assembled draft genomes by developing 16Stimator, a method to estimate 16S copy numbers when these repetitive regions collapse during assembly. Using a read depth approach, we estimate 16S copy numbers for 12 endophytic isolates from Arabidopsis thaliana and confirm estimates by qPCR. We further apply this approach to draft genomes deposited in NCBI and demonstrate accurate copy number estimation regardless of sequencing platform, with an overall median deviation of 14%. The expanded database of isolates with 16S copy number estimates increases the power of phylogenetic correction methods for determining organismal abundances from 16S amplicon surveys.
16S核糖体RNA基因(16S)是公认的细菌分类多样性标志物,尽管拷贝数差异模糊了扩增子与生物体丰度之间的关系。祖先状态重建方法可以通过与密切相关的参考基因组进行比较来预测16S拷贝数;然而,封闭基因组的数据库是有限的。在这里,我们通过开发16Stimator将16S拷贝数的参考数据库扩展到从头组装的草图基因组,16Stimator是一种在组装过程中这些重复区域折叠时估计16S拷贝数的方法。使用读取深度方法,我们估计了来自拟南芥的12种内生分离株的16S拷贝数,并通过定量PCR证实了估计值。我们进一步将这种方法应用于NCBI中存放的草图基因组,并证明无论测序平台如何,拷贝数估计都是准确的,总体中位数偏差为14%。具有16S拷贝数估计值的分离株扩展数据库提高了从16S扩增子调查确定生物体丰度的系统发育校正方法的能力。