Baer Mareike, Höppe Lisa, Seel Waldemar, Lipski André
Institute of Nutritional and Food Sciences, Food Microbiology and Hygiene, University of Bonn, Friedrich-Hirzebruch-Allee 7, 53115, Bonn, Germany.
Institute of Nutritional and Food Sciences, Nutrition and Microbiota, University of Bonn, Katzenburgweg 7, 53115, Bonn, Germany.
BMC Microbiol. 2024 Dec 6;24(1):521. doi: 10.1186/s12866-024-03677-8.
Long-read 16S rRNA gene amplicon sequencing has a high potential for characterizing food-associated microbiomes. The advantage results from sequencing the full-length (1,500 bp) gene, enabling taxonomic resolution at species level. Here we present a benchmarking study using mock communities representative of milking machine biofilms and raw meat, revealing challenges relevant to food-associated habitats. These were varying species abundances, reliable intra-genus differentiation of species, and detection of novel species with < 98.7% sequence identity to type strains. By using mock communities at different levels of preparation - as mixed whole cells, mixed extracted DNA, and mixed PCR products - we systematically investigated the influence of DNA extraction using two different kits, PCR amplification of 16S rRNA genes, sequencing, and bioinformatics analysis including reference database and gene copy number normalization on bacterial composition and alpha diversity.
We demonstrated that PacBio ccs-reads allowed for correct taxonomic assignment of all species present within the mock communities using a custom Refseq database. However, choice of percent identity values for taxonomic assignment had a strong influence on identification and processing of reads from novel species. PCR amplification of 16S rRNA genes produced the strongest bias on the observed community composition, while sequencing alone reproduced the preset composition well. The PCR bias can in part be attributed to differences in mol% G + C content of 16S rRNA genes resulting in preferred amplification of low mol% G + C-containing taxa.
This study underlines the importance of benchmarking studies with mock communities representing the habitat of interest to evaluate the methodology prior to analyzing real samples of unknown composition. It demonstrates the advantage of long-read sequencing over short-read sequencing, as species level identification enables in-depth characterization of the habitat. One benefit is improved risk assessment by enabling differentiation between pathogenic and apathogenic species of the same genus.
长读长16S rRNA基因扩增子测序在表征与食品相关的微生物群落方面具有很高的潜力。这种优势源于对全长(1500 bp)基因进行测序,能够在物种水平上实现分类分辨率。在此,我们展示了一项基准研究,该研究使用了代表挤奶机生物膜和生肉的模拟群落,揭示了与食品相关栖息地相关的挑战。这些挑战包括不同的物种丰度、属内物种的可靠区分,以及检测与模式菌株序列同一性<98.7%的新物种。通过使用不同制备水平的模拟群落——混合全细胞、混合提取的DNA和混合PCR产物——我们系统地研究了使用两种不同试剂盒进行DNA提取、16S rRNA基因的PCR扩增、测序以及包括参考数据库和基因拷贝数标准化在内的生物信息学分析对细菌组成和α多样性的影响。
我们证明,使用定制的Refseq数据库,PacBio ccs-reads能够对模拟群落中存在的所有物种进行正确的分类归属。然而,分类归属的同一性百分比值的选择对新物种读数的识别和处理有很大影响。16S rRNA基因的PCR扩增对观察到的群落组成产生了最强的偏差,而仅测序就能很好地重现预设组成。PCR偏差部分可归因于16S rRNA基因的mol% G + C含量差异,导致低mol% G + C含量的分类群优先扩增。
本研究强调了使用代表感兴趣栖息地的模拟群落进行基准研究以在分析未知组成的真实样本之前评估方法的重要性。它展示了长读长测序相对于短读长测序的优势,因为物种水平的鉴定能够对栖息地进行深入表征。一个好处是通过能够区分同一属的致病和非致病物种来改进风险评估。