Department of Veterans Affairs New York Harbor Health System, 423 E 23rd Street, New York, NY 10010, USA.
World J Gastroenterol. 2010 Sep 7;16(33):4135-44. doi: 10.3748/wjg.v16.i33.4135.
To design and validate broad-range 16S rRNA primers for use in high throughput sequencing to classify bacteria isolated from the human foregut microbiome.
A foregut microbiome dataset was constructed using 16S rRNA gene sequences obtained from oral, esophageal, and gastric microbiomes produced by Sanger sequencing in previous studies represented by 219 bacterial species. Candidate primers evaluated were from the European rRNA database. To assess the effect of sequence length on accuracy of classification, 16S rRNA genes of various lengths were created by trimming the full length sequences. Sequences spanning various hypervariable regions were selected to simulate the amplicons that would be obtained using possible primer pairs. The sequences were compared with full length 16S rRNA genes for accuracy in taxonomic classification using online software at the Ribosomal Database Project (RDP). The universality of the primer set was evaluated using the RDP 16S rRNA database which is comprised of 433 306 16S rRNA genes, represented by 36 phyla.
Truncation to 100 nucleotides (nt) downstream from the position corresponding to base 28 in the Escherichia coli 16S rRNA gene caused misclassification of 87 (39.7%) of the 219 sequences, compared with misclassification of only 29 (13.2%) sequences with truncation to 350 nt. Among 350-nt sequence reads within various regions of the 16S rRNA gene, the reverse read of an amplicon generated using the 343F/798R primers had the least (8.2%) effect on classification. In comparison, truncation to 900 nt mimicking single pass Sanger reads misclassified 5.0% of the 219 sequences. The 343F/798R amplicon accurately assigned 91.8% of the 219 sequences at the species level. Weighted by abundance of the species in the esophageal dataset, the 343F/798R amplicon yielded similar classification accuracy without a significant loss in species coverage (92%). Modification of the 343F/798R primers to 347F/803R increased their universality among foregut species. Assuming that a typical polymerase chain reaction can tolerate 2 mismatches between a primer and a template, the modified 347F and 803R primers should be able to anneal 98% and 99.6% of all 16S rRNA genes in the RDP database.
347F/803R is the most suitable pair of primers for classification of foregut 16S rRNA genes but also possess universality suitable for analyses of other complex microbiomes.
设计并验证广谱 16S rRNA 引物,用于高通量测序,以对源自人体前肠微生物组的细菌进行分类。
使用先前研究中通过 Sanger 测序获得的口腔、食管和胃微生物组的 16S rRNA 基因序列构建前肠微生物组数据集,这些序列代表了 219 种细菌。评估的候选引物来自欧洲 rRNA 数据库。为了评估序列长度对分类准确性的影响,通过修剪全长序列创建了不同长度的 16S rRNA 基因。选择跨越各种高变区的序列来模拟使用可能的引物对获得的扩增子。使用核糖体数据库项目(RDP)的在线软件比较全长 16S rRNA 基因在分类学分类中的准确性。使用由 433 306 个 16S rRNA 基因组成的 RDP 16S rRNA 数据库评估引物集的普遍性,这些基因由 36 个门代表。
与仅将 350nt 截断到大肠杆菌 16S rRNA 基因中对应于碱基 28 的位置下游 100 个核苷酸(nt)相比,截断到 100nt 导致 219 个序列中的 87 个(39.7%)发生错误分类,而仅将 350nt 截断到 350nt 导致 29 个(13.2%)序列发生错误分类。在 16S rRNA 基因的不同区域的 350-nt 序列读段中,使用 343F/798R 引物生成的扩增子的反向读段对分类的影响最小(8.2%)。相比之下,模拟单通 Sanger 读取的 900nt 截断导致 219 个序列中的 5.0%发生错误分类。343F/798R 扩增子在物种水平上准确分配了 219 个序列中的 91.8%。根据食管数据集物种的丰度进行加权,343F/798R 扩增子在没有显著降低物种覆盖范围的情况下产生了类似的分类准确性(92%)。修改 343F/798R 引物为 347F/803R 增加了它们在前肠物种中的通用性。假设典型的聚合酶链反应可以容忍引物和模板之间的 2 个错配,修改后的 347F 和 803R 引物应该能够退火 RDP 数据库中 98%和 99.6%的所有 16S rRNA 基因。
347F/803R 是最适合分类前肠 16S rRNA 基因的引物,但也具有适合分析其他复杂微生物组的普遍性。