Suppr超能文献

引物、流程、参数:16S rRNA 基因测序中的问题。

Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing.

机构信息

Core Facility Microbiome, ZIEL-Institute for Food & Health, Technische Universität München, Freising, Germany.

Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technische Universität München, Freising, Germany.

出版信息

mSphere. 2021 Feb 24;6(1):e01202-20. doi: 10.1128/mSphere.01202-20.

Abstract

Short-amplicon 16S rRNA gene sequencing is currently the method of choice for studies investigating microbiomes. However, comparative studies on differences in procedures are scarce. We sequenced human stool samples and mock communities with increasing complexity using a variety of commonly used protocols. Short amplicons targeting different variable regions (V-regions) or ranges thereof (V1-V2, V1-V3, V3-V4, V4, V4-V5, V6-V8, and V7-V9) were investigated for differences in the composition outcome due to primer choices. Next, the influence of clustering (operational taxonomic units [OTUs], zero-radius OTUs [zOTUs], and amplicon sequence variants [ASVs]), different databases (GreenGenes, the Ribosomal Database Project, Silva, the genomic-based 16S rRNA Database, and The All-Species Living Tree), and bioinformatic settings on taxonomic assignment were also investigated. We present a systematic comparison across all typically used V-regions using well-established primers. While it is known that the primer choice has a significant influence on the resulting microbial composition, we show that microbial profiles generated using different primer pairs need independent validation of performance. Further, comparing data sets across V-regions using different databases might be misleading due to differences in nomenclature (e.g., versus ) and varying precisions in classification down to genus level. Overall, specific but important taxa are not picked up by certain primer pairs (e.g., is missed using primers 515F-944R) or due to the database used (e.g., in GreenGenes and the genomic-based 16S rRNA Database). We found that appropriate truncation of amplicons is essential and different truncated-length combinations should be tested for each study. Finally, specific mock communities of sufficient and adequate complexity are highly recommended. In 16S rRNA gene sequencing, certain bacterial genera were found to be underrepresented or even missing in taxonomic profiles when using unsuitable primer combinations, outdated reference databases, or inadequate pipeline settings. Concerning the last, quality thresholds as well as bioinformatic settings (i.e., clustering approach, analysis pipeline, and specific adjustments such as truncation) are responsible for a number of observed differences between studies. Conclusions drawn by comparing one data set to another (e.g., between publications) appear to be problematic and require independent cross-validation using matching V-regions and uniform data processing. Therefore, we highlight the importance of a thought-out study design including sufficiently complex mock standards and appropriate V-region choice for the sample of interest. The use of processing pipelines and parameters must be tested beforehand.

摘要

短扩增子 16S rRNA 基因测序目前是研究微生物组的首选方法。然而,关于程序差异的比较研究很少。我们使用各种常用的方案对越来越复杂的人类粪便样本和模拟群落进行了测序。我们研究了针对不同可变区(V 区)或其范围(V1-V2、V1-V3、V3-V4、V4、V4-V5、V6-V8 和 V7-V9)的短引物在组成结果上的差异,因为引物的选择会导致差异。接下来,我们还研究了聚类(操作分类单元[OTU]、零半径 OTU[zOTU]和扩增子序列变体[ASV])、不同数据库(GreenGenes、核糖体数据库项目、Silva、基于基因组的 16S rRNA 数据库和所有物种生命树)以及生物信息学设置对分类分配的影响。我们使用经过验证的引物对所有常用的 V 区进行了系统比较。虽然已知引物选择对微生物组成的结果有重大影响,但我们表明,使用不同引物对生成的微生物谱需要独立验证其性能。此外,由于命名法(例如, 和 )和分类到属级别的精度不同,使用不同数据库在 V 区之间比较数据集可能会产生误导。总体而言,某些特定但重要的分类群可能会被某些引物对漏掉(例如,引物 515F-944R 不会检测到 )或由于使用的数据库(例如,GreenGenes 和基于基因组的 16S rRNA 数据库中没有 )。我们发现,适当的扩增子截断是必不可少的,并且应该针对每个研究测试不同的截断长度组合。最后,强烈推荐使用足够和适当复杂的模拟群落。在 16S rRNA 基因测序中,当使用不合适的引物组合、过时的参考数据库或不充分的管道设置时,某些细菌属在分类图谱中被低估或甚至缺失。关于最后一点,质量阈值以及生物信息学设置(即聚类方法、分析管道以及特定调整,如截断)负责解释许多研究之间的差异。通过将一个数据集与另一个数据集(例如,在出版物之间)进行比较得出的结论似乎存在问题,需要使用匹配的 V 区和统一的数据处理进行独立的交叉验证。因此,我们强调了在研究设计中包含足够复杂的模拟标准和适当的 V 区选择的重要性,以便对感兴趣的样本进行研究。在进行测序之前,必须对处理管道和参数进行测试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a7e/8544895/c1813486c059/msphere.01202-20-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验