Suppr超能文献

16S rRNA数据库中的分类注释和引导树错误。

Taxonomy annotation and guide tree errors in 16S rRNA databases.

作者信息

Edgar Robert

机构信息

Sonoma, CA, USA.

出版信息

PeerJ. 2018 Jun 12;6:e5030. doi: 10.7717/peerj.5030. eCollection 2018.

Abstract

Sequencing of the 16S ribosomal RNA (rRNA) gene is widely used to survey microbial communities. Specialized 16S rRNA databases have been developed to support this approach including Greengenes, RDP and SILVA. Most taxonomy annotations in these databases are predictions from sequence rather than authoritative assignments based on studies of type strains or isolates. In this work, I investigated the taxonomy annotations and guide trees provided by these databases. Using a blinded test, I estimated that the annotation error rate of the RDP database is ∼10%. The branching orders of the Greengenes and SILVA guide trees were found to disagree at comparable rates with each other and with taxonomy annotations according to the training set (authoritative reference) provided by RDP, indicating that the trees have comparable quality. Pervasive conflicts between tree branching order and type strain taxonomies strongly suggest that the guide trees are unreliable guides to phylogeny. I found 249,490 identical sequences with conflicting annotations in SILVA v128 and Greengenes v13.5 at ranks up to phylum (7,804 conflicts), indicating that the annotation error rate in these databases is ∼17%.

摘要

16S核糖体RNA(rRNA)基因测序被广泛用于调查微生物群落。为支持这种方法,已经开发了专门的16S rRNA数据库,包括Greengenes、RDP和SILVA。这些数据库中的大多数分类注释是基于序列的预测,而不是基于模式菌株或分离株研究的权威分类。在这项工作中,我研究了这些数据库提供的分类注释和引导树。通过一项盲测,我估计RDP数据库的注释错误率约为10%。根据RDP提供的训练集(权威参考),发现Greengenes和SILVA引导树的分支顺序相互之间以及与分类注释的不一致率相当,这表明这些树的质量相当。树的分支顺序与模式菌株分类之间普遍存在冲突,这强烈表明引导树对于系统发育来说是不可靠的指导。我发现在SILVA v128和Greengenes v13.5中,在门及以下分类等级有249,490个相同序列存在冲突注释(7,804个冲突),这表明这些数据库中的注释错误率约为17%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验