构建和评估统一的经过精心整理的参考数据库，以提高使用 16S rRNA 序列数据的细菌分类学分类。

Construction & assessment of a unified curated reference database for improving the taxonomic classification of bacteria using 16S rRNA sequence data.

机构信息

Biomedical Informatics Centre, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India.

Biomedical Informatics Centre; Department of Gastroenterology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India.

出版信息

Indian J Med Res. 2020 Jan;151(1):93-103. doi: 10.4103/ijmr.IJMR_220_18.

DOI:10.4103/ijmr.IJMR_220_18

PMID:32134020

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7055167/

Abstract

BACKGROUND & OBJECTIVES: For bacterial community analysis, 16S rRNA sequences are subjected to taxonomic classification through comparison with one of the three commonly used databases [Greengenes, SILVA and Ribosomal Database Project (RDP)]. It was hypothesized that a unified database containing fully annotated, non-redundant sequences from all the three databases, might provide better taxonomic classification during analysis of 16S rRNA sequence data. Hence, a unified 16S rRNA database was constructed and its performance was assessed by using it with four different taxonomic assignment methods, and for data from various hypervariable regions (HVRs) of 16S rRNA gene.

METHODS

We constructed a unified 16S rRNA database (16S-UDb) by merging non-ambiguous, fully annotated, full-length 16S rRNA sequences from the three databases and compared its performance in taxonomy assignment with that of three original databases. This was done using four different taxonomy assignment methods [mothur Naïve Bayesian Classifier (mothur-nbc), RDP Naïve Bayesian Classifier (rdp-nbc), UCLUST, SortMeRNA] and data from 13 regions of 16S rRNA [seven hypervariable regions (HVR) (V2-V8) and six pairs of adjacent HVRs].

RESULTS

Our unified 16S rRNA database contained 13,078 full-length, fully annotated 16S rRNA sequences. It could assign genus and species to larger proportions (90.05 and 46.82%, respectively, when used with mothur-nbc classifier and the V2+V3 region) of sequences in the test database than the three original 16S rRNA databases (70.88-87.20% and 10.23-24.28%, respectively, with the same classifier and region).

INTERPRETATION & CONCLUSIONS: Our results indicate that for analysis of bacterial mixtures, sequencing of V2-V3 region of 16S rRNA followed by analysis of the data using the mothur-nbc classifier and our 16S-UDb database may be preferred.

摘要

背景与目的

在细菌群落分析中，通过与三个常用数据库（Greengenes、SILVA 和核糖体数据库项目（RDP））之一进行比较，对 16S rRNA 序列进行分类学分类。假设一个包含来自所有三个数据库的完全注释、非冗余序列的统一数据库，在分析 16S rRNA 序列数据时，可能会提供更好的分类学分类。因此，构建了一个统一的 16S rRNA 数据库，并使用四种不同的分类分配方法和来自 16S rRNA 基因的不同高变区（HVR）的数据集来评估其性能。

方法

通过合并三个数据库中无歧义、完全注释、全长 16S rRNA 序列，构建了一个统一的 16S rRNA 数据库（16S-UDb），并将其在分类学分配方面的性能与三个原始数据库进行了比较。这是使用四种不同的分类分配方法（ mothur 朴素贝叶斯分类器（mothur-nbc）、RDP 朴素贝叶斯分类器（rdp-nbc）、 UCLUST、SortMeRNA）和来自 16S rRNA 的 13 个区域（7 个高变区（HVR）（V2-V8）和 6 对相邻 HVR）的数据完成的。

结果

我们的统一 16S rRNA 数据库包含 13078 个全长、完全注释的 16S rRNA 序列。当使用 mothur-nbc 分类器和 V2+V3 区域时，它可以将属和种分配给更大比例的测试数据库中的序列（分别为 90.05%和 46.82%），而三个原始 16S rRNA 数据库的分配比例（分别为 70.88%-87.20%和 10.23%-24.28%），使用相同的分类器和区域。

解释与结论

我们的结果表明，在分析细菌混合物时，最好对 16S rRNA 的 V2-V3 区域进行测序，然后使用 mothur-nbc 分类器和我们的 16S-UDb 数据库对数据进行分析。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

构建和评估统一的经过精心整理的参考数据库，以提高使用 16S rRNA 序列数据的细菌分类学分类。

Construction & assessment of a unified curated reference database for improving the taxonomic classification of bacteria using 16S rRNA sequence data.

机构信息

出版信息

METHODS

RESULTS

背景与目的

方法

结果

解释与结论

相似文献

引用本文的文献

本文引用的文献

构建和评估统一的经过精心整理的参考数据库，以提高使用 16S rRNA 序列数据的细菌分类学分类。

Construction & assessment of a unified curated reference database for improving the taxonomic classification of bacteria using 16S rRNA sequence data.

机构信息

出版信息

METHODS

RESULTS

背景与目的

方法

结果

解释与结论

相似文献

引用本文的文献

本文引用的文献