通过专用参考数据库改进人类肠道16S rRNA序列的分类学归属
Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database.
作者信息
Ritari Jarmo, Salojärvi Jarkko, Lahti Leo, de Vos Willem M
机构信息
Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland.
Laboratory of Microbiology, Wageningen University, Wageningen, the Netherlands.
出版信息
BMC Genomics. 2015 Dec 12;16:1056. doi: 10.1186/s12864-015-2265-y.
BACKGROUND
Current sequencing technology enables taxonomic profiling of microbial ecosystems at high resolution and depth by using the 16S rRNA gene as a phylogenetic marker. Taxonomic assignation of newly acquired data is based on sequence comparisons with comprehensive reference databases to find consensus taxonomy for representative sequences. Nevertheless, even with well-characterised ecosystems like the human intestinal microbiota it is challenging to assign genus and species level taxonomy to 16S rRNA amplicon reads. A part of the explanation may lie in the sheer size of the search space where competition from a multitude of highly similar sequences may not allow reliable assignation at low taxonomic levels. However, when studying a particular environment such as the human intestine, it can be argued that a reference database comprising only sequences that are native to the environment would be sufficient, effectively reducing the search space.
RESULTS
We constructed a 16S rRNA gene database based on high-quality sequences specific for human intestinal microbiota, resulting in curated data set consisting of 2473 unique prokaryotic species-like groups and their taxonomic lineages, and compared its performance against the Greengenes and Silva databases. The results showed that regardless of used assignment algorithm, our database improved taxonomic assignation of 16S rRNA sequencing data by enabling significantly higher species and genus level assignation rate while preserving taxonomic diversity and demanding less computational resources.
CONCLUSION
The curated human intestinal 16S rRNA gene taxonomic database of about 2500 species-like groups described here provides a practical solution for significantly improved taxonomic assignment for phylogenetic studies of the human intestinal microbiota.
背景
当前的测序技术能够通过使用16S rRNA基因作为系统发育标记,以高分辨率和深度对微生物生态系统进行分类分析。新获取数据的分类归属是基于与综合参考数据库的序列比较,以找到代表性序列的一致分类。然而,即使对于像人类肠道微生物群这样特征明确的生态系统,将16S rRNA扩增子读数分类到属和种水平也具有挑战性。部分原因可能在于搜索空间的规模庞大,众多高度相似序列之间的竞争可能不允许在低分类水平上进行可靠的分类。然而,在研究特定环境(如人类肠道)时,可以认为仅包含该环境原生序列的参考数据库就足够了,这有效地减少了搜索空间。
结果
我们基于人类肠道微生物群的高质量序列构建了一个16S rRNA基因数据库,得到了一个由2473个独特的原核生物类物种群及其分类谱系组成的精选数据集,并将其性能与Greengenes和Silva数据库进行了比较。结果表明,无论使用何种分类算法,我们的数据库都能提高16S rRNA测序数据的分类准确性,在保持分类多样性的同时,显著提高属和种水平的分类率,且所需计算资源更少。
结论
本文描述的约2500个类物种群的精选人类肠道16S rRNA基因分类数据库为显著改进人类肠道微生物群系统发育研究的分类归属提供了一个切实可行的解决方案。